Discrete Math - The MKD Remix (CSC230 Version)

About this text

This revision of the book was last updated on December 8, 2025.

This is an old revision of the Remix. The current revision can be viewed here.

CAUTION - CHAPTER UNDER CONSTRUCTION!

NOTICE: If you are NOT enrolled in the CSC 230 course that I teach at SFSU, you probably are looking for the “public version” of the book, which I only update between semesters. The version you are looking at now is the “CSC230 version” of the book, which I will be updating throughout the semester.

This chapter was last updated on June 19, 2025.

This work, "Discrete Math - The MKD Remix" by Mark Kelly Davis, is adapted from “Discrete Math” by Mohamed Jamaloodeen, Kathy Pinzon, Daniel Pragel, Joshua Roberts, and Sebastien Siva. “Discrete Math” is used under CC BY-NC 4.0. All files for “Discrete Math” are available at its associated GitHub repository.

This work, "Discrete Math - The MKD Remix", is licensed under CC BY-NC 4.0 by Mark Kelly Davis.

If you were a student in my course during the Spring 2024 semester, the version of the book you used can be found here.

If you were a student in one of my courses during the Fall 2024 semester, the version of the book you used can be found here.

If you were a student in one of my courses during the Spring 2025 semester, the version of the book you used can be found here.

If you were a student in one of my courses during the Fall 2025 semester, the version of the book you used can be found here.

How does the Remix differ from the original work?

The remixer’s goal is to adapt the original “Discrete Math” to create an OER textbook for a one-semester Discrete Mathematics/Discrete Structures course for Computer Science students who have some experience coding in Java, Python, or another high-level programming language, and have completed an Algebra Ⅱ-level (or equivalent) mathematics course. The Remix is a work-in-progress that will continue to evolve over time toward this goal.

Use of materials from the Discrete Mathematics Project

The Remix is designed to work with the activity-based lesson materials developed by the Discrete Mathematics Project (DMP.) See the Appendix: For Instructors for information on how the remixer uses these lessons with the text.

Alignment To Standards

The remixer’s intent is that the Remix:

address topics and learning outcomes listed for
- ACM/IEEE-CS/AAAI Joint Task Force on Computer Science Curricula "Computer Science Curricula 2023" MSF-Discrete: Discrete Mathematics
- ACM/IEEE-CS Joint Task Force on Computing Curricula "Computer Science Curricula 2013" Discrete Structures (DS) but omitting the DS/Discrete Probability topics and learning outcomes.
  Note: ACM CCECC "Computer Science Curricular Guidance for Associate-Degree Transfer Programs with Infused Cybersecurity" Discrete Structures Knowledge Area (DS) is a list of learning outcomes and assessment rubrics that does not specify topics, so the remixer preferred the 2013 document to that June 2017 update.
- California Community Colleges (CCC) C-ID Math 152: Discrete Structures content areas Ⅰ through Ⅴ, omitting area Ⅵ. Discrete Probability.
  Note that the content listed for C-ID Math 152: Discrete Structures is identical to the topics listed for ACM/IEEE-CS CS2008 Review Taskforce "Computer Science Curriculum 2008: An Interim Revision of CS 2001" Discrete Structures (DS).
- additional topics from the ACM CCECC Software Engineering curriculum 2005 Discrete Structures Mathematics course.

Other Considerations in the Remix

In addition to using activity based lessons and aligning to the standards, the Remix aims to:

use terminology, definitions, notation, and symbols that align with commonly-used textbooks such as Kenneth H. Rosen’s Discrete Mathematics and Its Applications, 8th Edition.

give students the opportunity to learn new content based on what they already know then to move toward building a more formal understanding of the content (e.g., pointing out that the set of odd integers and the set of even integers are the two equivalence classes of an equivalence relation, , and that the rules for adding and multiplying "odds" and "evens" is an example of modular arithmetic.)

organize the content in "chunks" for ease of reading and digestion of new ideas
refer to original sources (e.g., Pascal’s handwritten triangle as well as earlier non-European references to and uses of the number triangle.)

If you are looking for an OER textbook for a Discrete Mathematics course intended primarily for Mathematics majors (e.g., one that does not include topics like analysis of algorithms and binary tree traversal), there are many suitable ones that exist. For example, see Oscar Levin’s Discrete Mathematics: An Open Introduction, 4th edition.

About the use of Python in the Remix

The Remix is intended for a course that does not require programming. Python is not part of the course content.

The original “Discrete Math” uses Python code samples throughout the textbook and includes "Introduction to Python" as its 3rd chapter. The Remix repurposes this content: Code samples in the Remix are used as "pseudocode that can run on a computer," with coding that uses "just enough Python" to illustrate important abstract ideas and concepts. Most of the existing Python examples were altered, and many new Python examples were introduced throughout the Remix. Note that, in order to illustrate concepts and ideas in the style of pseudocode, much of the Python code shown in the Remix avoids using built-in functions and often uses less efficient data structures and algorithms! For example, in the chapter "Algorithms & Big-O", code samples for sorting and searching avoid using built-in Python functions in order to illustrate all steps needed by the algorithm. In many cases, a comment can be found near a non-optimal code example that explains or illustrates a more Pythonic way of coding.

Partial list of changes made (or to be made) to the Remix.

Terminology, definitions, notation and symbols were changed throughout the Remix to align with other commonly-used textbooks. For example, the Remix defines the set of natural numbers $\mathbb{N}$ to include the integer 0 as an element; this definition is very common and is in fact a "standard" that appears in International Standard ISO 80000-2:2019, Quantities and units — Part 2: Mathematics.
In the chapter "Introducing Discrete Mathematics," informal definitions of foundational mathematical ideas needed in the course are introduced. This is done so that learners can see what they do (or do not) already know and create the necessary basis to learn the course content. In addition, a new Appendix, "On-Demand Math Resources" was written which includes material that learners can refer to as needed.

The original chapter "Introduction to Python" was moved to the appendices.
The original chapter "Counting" was split into two chapters, "Counting: Arithmetic Techniques" and "Counting: Permutations And Combinations". The first of these chapters is placed near the beginning of the book, but the second is place much later, after sequences and recurrence relations have been discussed.
The order of the chapters "Set Theory" and "Logic" was swapped. New material was inserted into each of the two chapters.
A new chapter, "Proofs: Basic Techniques," was written and inserted after "Logic."
The chapter "Number Bases" is based on the original chapter "Number Theory," but the content on divisibility, congruence, and modular arithmetic was moved into the remixed chapters "Introducing Discrete Mathematics" and "Relations."
The chapter "Sequences and Recursion" is based on the original chapter "Sequences, Recursive Definitions, and Induction," which was split into two new chapters, "Sequences and Recursion" and "Proofs: Mathematical Induction." "Sequences and Recursion" appears before and as a lead-in to "Functions" since sequences are a special case of functions and recursion is often used to define functions.
The chapter "Functions" was moved to its new position, several chapters after "Set Theory." This was done for the following reasons:
- The learner is expected to have a basic working understanding, from previous classes, of the one-to-one correspondence concept: A unique pairing of each element in one set with elements in another set.
- The learner is expected also to have a basic working understanding of the function concept: A rule/mapping/association that takes certain objects as inputs and assigns each such input to exactly one output object.
- It is likely that the learner has some ability to work with function notation and operations such as composition and inversion of functions from previous mathematics courses.
- The remixer felt that a precise, formal definition of function, as well as properties such as injectivity and surjectivity, could be delayed until after learners had used their previous knowledge of functions.
A new chapter, "Relations," was written to include topics listed in the ACM/IEEE-CS/AAAI and CCC C-ID courses but absent from the original work, and was inserted after "Functions". This chapter also includes some of the content on divisibility, congruence, and modular arithmetic from the "Number Theory" chapter of the original work.
The chapter "Proofs: Mathematical Induction" is based in part on the original chapter "Sequences, Recursive Definitions, and Induction," but the content of this chapter was heavily rewritten and new content was inserted. This chapter was placed immediately before the chapters "Rates of Growth of Functions" and "Algorithms and Their Analysis" so that mathematical induction can be viewed as a way of validating algorithms rather than as just another more complicated proof technique.
The order of the chapters "Algorithms" and "Growth of Functions" was swapped, then the title "Growth of Functions" was changed to "Rates of Growth of Functions" and the title "Algorithms" was changed to "Algorithms and Their Analysis." New content was inserted into each of the chapters and existing content was revised.
Note that algorithms and their analysis are not mentioned explicitly as topics to be included in the ACM/IEEE-CS/AAAI and CCC C-ID courses, but these topics fit naturally as a motivation to learn much of the other content of the Remix.

The original chapter "Graph Theory" was split into two chapters, "Graphs" and "Trees". Additional content will be introduced into each of the new chapters.

1. Introducing Discrete Mathematics

This chapter was last updated on July 9, 2025.

Welcome to the Remix! I hope this textbook provides you an opportunity for a stimulating and intellectually enjoyable learning experience.

Mathematics is one of human civilization’s greatest tools: It involves pattern noticing, collecting, comparing, counting, generalizing, formalizing, and abstracting. The development of mathematics is a continuing work that spans at least 5,000 years and many different peoples and cultures. This development will continue long after you and I are gone - never forget that we are living during an era that will be someone else’s Ancient History!

1.1. What is "Discrete Mathematics"?

There seems to be no universally agreed-upon definition of "discrete mathematics," but I will describe to you my understanding of what the phrase "discrete mathematics" means.

Discrete mathematics is the mathematics people use to study and understand structures that are built from individual objects in a way that the individual objects can still be treated as separate from one another within the structure. In such a structure, the individual objects can be put into categories and counted; it makes sense to ask questions like "What is the next object in the structure after this one?" or "Which other objects are the closest to this one?" (where "closest" could refer to physical distance or could mean most similar in color or size) or "How many objects in the structure have a certain property?" Discrete mathematics is a collection of tools that can be used to answer these kind of questions.

As an example, consider a wall, a structure built from individual stones and bricks, part of which is shown in the image. The individual objects in the structure are the stones and bricks. You can still identify individual objects even though they were combined to build the larger structure, and can classify an individual object by type (either stone or brick) or by color or by the how close to the top of the wall the object is. You can try to count the total number of individual objects, and you can count the number of individual objects that are next to any one object you choose. The wall is a (non-mathematical) example of a discrete structure.
Image credit: "Vintage Stone And Brick Wall" by Paul Brennan. The image is dedicated to the public domain under CC0.

Here are two more examples of discrete structures.

A family of humans can be treated as a discrete structure.
The humans in a family are seen as individuals who can be distinguished from one another. Questions like "Which humans are siblings of this human?" and "Which humans are parents of this human?" and "How many children does this human have?" make sense.
The set of integers, in the usual order, as represented on a number line is a discrete structure.
It makes sense to ask questions like "What is the next integer after -2?" or "What are the integers that are closest to -2?" for this structure.

Notice that in these examples, the individual objects are of a different nature than the entire structure: Individual bricks and stones aren’t considered a wall, individual humans aren’t considered a family, and individual integers are not a collection of integers.

So what kind of structure is not discrete?

Consider the water in the glass shown in the image. The water in the glass is a structure, but for most of human history it has not been seen as made up of objects that are of a different nature than the structure. That is, many generations of humans have recognized the difference between a wall and the individual stones and bricks that make up the wall, but likely have perceived water as being made up of… water. This is why humans tend to use measurement of quantities of water, using units such as fluid ounces or milliliters. In our current era, we humans understand that the water is built from molecules, which are of a different nature than the water "structure," but because the molecules are so tiny, numerous, and densely distributed throughout the structure, we humans (except perhaps some scientists and engineers) still use measurement instead of counting and ignore the individual molecules. For example, a recipe might call for "8 ounces of water" but never would ask for "7.6 × 10²⁴ molecules of water." Notice that the measurement units used do not correspond in any natural way to individual molecules or groupings of individual molecules (unless, perhaps, you are a scientist or engineer.)
Also, notice that the glass container itself is another structure in which the individual molecules tend to be ignored.
Image credit: "Water Glass" by Peter Griffin. The image is dedicated to the public domain under CC0.

Humans use continuous mathematics like calculus to study a structure that is built from objects that are densely distributed throughout the structure. For such a structure, measurement and approximation is more appropriate than counting.

The set of real numbers, in the usual order, as represented on a number line is NOT a discrete structure.
It does NOT makes sense to ask about "the next real number after $\pi$ on the number line" because if we think c is "the next real number after $\pi$ on the number line" then we can compute the number $c_{1} = \frac{c+\pi}{2},$ which is the midpoint of the interval with endpoints $\pi$ and c, so $c_{1}$ is closer to $\pi$ than c is, which means $c_{1}$ is a better candidate for "the next real number after $\pi$ on the number line" than c … so the concept "the next real number after $\pi$ on the number line" does not make sense for this structure as such a number cannot exist! We can just keep computing numbers that get closer and closer to $\pi$ over and over again. Likewise, "the real numbers that are closest to $\pi$" do not exist.
Instead, it makes sense to talk about "the real numbers that differ from $\pi$ by less than $\epsilon$" where $\epsilon$ is some positive real number (The symbol $\epsilon$ is the Greek letter "epsilon".) By choosing $\epsilon$ as small as we like, we can describe the real numbers that are as close to $\pi$ as needed to use as approximations to $\pi.$ This is why and how limits are defined and used in courses like precalculus and calculus, subjects that involve the real number line.
Note 1: You could use any other real number instead of $\pi$ in the discussion above because the argument will still be valid. For example, "the next real number after $-2$ on the number line" and "the real numbers that are closest to $-2$ on the number line" do not exist since you can choose numbers that get closer and closer to $-2$ such as $-1,$ $-1.5,$ $-1.75,$ and so on.
Note 2: The technique used to justify that "the next real number after $\pi$ on the number line" does not exist is called proof by contradiction and will be discussed along with other techniques in the Proofs: Basic Techniques chapter of this textbook.
Note 3: Is $\pi$ really on the number line? Click here to view an artist’s explanation about where $\pi$ lies on the number line. Be warned that some of what this artist says (about history, about $\pi$ being equal to 3.14) is not correct, but the visualization still may be helpful to you.
Note 4: FYI, the set of real numbers is called the continuum in advanced mathematics courses.

1.2. To The Student: Some Things To Know Before You Begin

Here are some things to orient you.

1.2.1. How To Use This textbook

The Remix is designed to build on your previous knowledge, and then build new knowledge and understanding by visiting the topics over and over again. In the next subsection, the basis of foundational mathematical ideas are discussed. You are encouraged to read through all of those foundational topics and to work through the Questions and Challenges.

There are two analogies that I, the remixer, like to use here:

Think of the course presented in the Remix as a language course.
You will use this language to talk about mathematics and computer science as you continue along your professional path. You cannot master a new language by learning some words or grammar in the first few weeks of a language course and then "forgetting" that content later, but still succeed in the course and master the language to use beyond the course. You need to assume that everything you learn in this textbook will apply later in the textbook and in your later learning. One of the goals of this textbook is to help you build a broad and rich vocabulary in discrete mathematics and a way of thinking that will apply to your future work.
Think of the course presented in the Remix as exercising at a gym.
You will build your strength and awareness about your abilities by working out. It may be tempting to watch others "demonstrate" how to do certain exercises and then choose not to do them yourself, but you are selling yourself short. A physical trainer usually knows already what they are capable of doing, so it is no compliment if you tell them "Wow, you’re really strong"… the trainer’s goal is to get you to say "Wow, I’m starting to get really strong!"

1.2.2. Foundations

Learning discrete mathematics requires putting together old ideas in new ways and adding new ideas to your mix.

This subsection highlights some of the ideas you will need to work with in order to get the most out of this textbook. Also, you’ll find opportunities here to practice with the built-in tools that are part of the textbook. Some of these ideas may be old for you, but others will probably be new. To use this textbook effectively, you’ll need to be able to work with each of these ideas with relative ease. If a few of these ideas are brand new to you, that is fine: All of these ideas will be discussed again, much more formally and in greater detail, later in the textbook.

I encourage you to read all of the topics discussed below. Don’t skip anything, even if it looks "old" because there may be some new ways of understanding the old ideas that are introduced and that will be used later in the course.

A set is an unordered collection of zero or more objects. You can think of a set as a list of the names of the objects included, but we do not care about the order of the names in the list and we do not care if the list contains duplicate names.

An example is the set of the names of the additive primary colors of light which can be written as \[\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\] which contains exactly three elements: "blue", "green", and "red." It does not matter that "red" appears twice, and it does not matter that the order of the colors in the list "blue", "green", and "red" is different from the order used in the previous set notation. We could define the same set with any other list that contains the same three elements, for example, $\{ \text{"green"},\,\text{"red"},\,\text{"blue"} \}.$

As you will see in the Set Theory chapter, it is common practice to use uppercase English letters to stand for sets; this is similar to how lowercase letters are used as variables or constants in algebra. For example, you could write \[P = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}\] which allows you to refer to the set as P instead of needing to read the list of elements every time you want to talk about the set. This is like using the Greek letter π instead of needing to read off the first few digits of the non-repeating decimal expansion 3.14159265359… every time you refer to that number.
- NOTE 1: In mathematics, a set like $P,$ that is, $\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}$ is treated as a constant, so you cannot remove elements or insert new elements once a set has been defined and described. As an example, if you wanted to insert the name "white" into the set, you would need to define a new set \[ C = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"},\,\text{"white"} \}\] that contains four elements by appending "white" to the list used to define the old set P.
- NOTE 2:
  - One way of creating a new set from two other sets is to form the union of the two sets. The union of two sets is formed by joining the sets.
    For example, If $P = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}$ and $T = \{ \text{"yellow"},\,\text{"red"},\,\text{"blue"} \}$, then the union of the two sets is $\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"}, \, \text{"yellow"},\,\text{"red"},\,\text{"blue"} \},$ the set that contains the colors that are in at least one of the sets $P$ and $T.$
  - Another way of creating a new set from two other sets is to form the intersection of the two sets. The intersection is the meeting of the two sets.
    For example, If $P = \{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \}$ and $T = \{ \text{"yellow"},\,\text{"red"},\,\text{"blue"} \}$, then the intersection of the two sets is $\{ \text{"red"},\,\text{"blue"} \},$ the set that contains the colors that are in both of the sets $P$ and $T.$
    Likewise, if $S = \{ \text{"yellow"},\,\text{"cyan"},\,\text{"magenta"} \},$ the set of additive secondary colors, then the intersection of $P$ and $S$ is the set $\{ \},$ the set that contains zero elements. We call the set $\{ \}$ the empty set.
- NOTE 3: We can define a subset of any set by selecting zero or more of the members of the set. For example, $\{ \text{"green"},\,\text{"blue"} \}$ and $\{ \text{"green"} \}$ are subsets of $\{ \text{"red"},\,\text{"green"},\,\text{"blue"},\,\text{"red"} \},$ and so is $\{ \},$ the empty set.
- NOTE 4: A set can also be defined by a property instead of a list. This will be explained in the Set Theory chapter.

A one-to-one correspondence between two sets A and B is a pairing of each member of the set A with exactly one member of set B, and each member of set B with exactly one member of set A.

Examples - One-To-One Correspondences

As a first example, the one-to-one correspondence between the set of uppercase English letters and the set of lowercase English letters is represented in the image. You can choose each letter in the upper row and follow the arrow down to the corresponding lowercase letter that it is paired with.

Notice that you could choose any lowercase letter in the image and follow the arrow up to its corresponding uppercase letter. This shows that the one-to-one correspondence is invertible in the sense that there is a second one-to-one correspondence that can "undo" what the first one-to-one correspondence does. This is why the arrows in the image point in both directions: It is natural to interpret the arrow as meaning "change case" in either direction.
In fact, the image for the one-to-one correspondence actually represents two functions: The first function is the uppercase-to-lowercase conversion that uses only the down arrows, and the second function lowercase-to-uppercase conversion that uses only the up arrows.

As a second example, consider the game "rock, paper, scissors." The image represents the one-to-one correspondence that pairs each element of the set $\{ \text{"rock"},\,\text{"paper"},\,\text{"scissors"} \}$ with the element of the same set that it loses to.
Image credit: Remixer-created derivative of original work "THE_GAME_OF_KEN.(1910)-illustration-_page_315.png". According to Japanese Copyright Law (June 1, 2018 grant) the copyright on the original work has expired and is as such public domain. Also, the original work is in the public domain in the United States because it was published (or registered with the U.S. Copyright Office) before January 1, 1930.

For this one-to-one correspondence, it makes more sense to use arrows that point in only one direction. In the image, all the arrows point to the winner, so each arrow can be read as "loses to," for example "rock loses to paper".
Note 1: A one-to-one correspondence between a set and itself is called a permutation. You will see the term "permutation" again, used with a related but different meaning, in the Counting: Permutations and Combinations chapter.
Note 2: "Rock, paper, scissors" is a modern form of a very old game that appears to date back at least 1,800 years to China during the time of the Han dynasty. You can learn more about the history of "rock, paper, scissors" and other intrasitive games at this Wikipedia link. The image used in the Remix is derived from one that appears in a book published in 1910 that describes the Japanese game stone-ken or jan-ken.

A natural number is one of the nonnegative integers. The letter $\mathbb{N}$ denotes the set of natural numbers, that is, \[\mathbb{N} = \{ 0,\,1,\,2,\,3,\,\ldots \}\] where the three dots are used because we cannot write down the entire list of natural numbers.
- NOTE: We almost always use the base-ten (decimal) place-value system to represent natural numbers, but later in this textbook you will see that there are other number base systems such as the base-two (binary) place-value system, that are useful in some contexts.
- WARNING: The definition of natural numbers used in this textbook is an ISO standard, but be aware that other textbooks and sources may use the "nonstandard" definition that the natural numbers are the positive integers. In this textbook, the set of natural numbers ALWAYS include zero as well as the positive integers, which is the standard definition.

Example - Starting at 0

Welcome to your first opportunity to use PythonTutor in this textbook!

The idea that it is "natural" to start counting from 0 may be familiar to you from coding: Both Java arrays and Python lists are indexed starting at 0.

Notice in the Python code below that the initial item in list L is the string "Discrete" at index 0, not index 1.

To step through the code, click on the "Next" button.

Edit in PythonTutor

A set is called a finite set if either the set is empty or there is a one-to-one correspondence between the set and $\{1, 2, \ldots , n \}$ for some positive integer $n.$ For example, the image represents how a child may "count up to 5" by pairing fingers with the numbers in $\{1, 2, \ldots , 5 \}.$
A set that is not a finite set is called an infinite set. Both finite and infinite sets will be discussed in more detail in the Set Theory chapter.

You may be surprised to see Counting listed as a topic in a university-level course because you probably learned to count by putting physical objects (like fingers) into a one-to-one correspondence with number words such as "one, two, three, four, five" when you were a young child. That way of counting one by one is fine for small sets, but the counting techniques discussed in this textbook let you count the number of elements in very large finite sets quickly.

Example - The Multiplication Principle

The multiplication principle, also called the product rule, is the following statement.

Suppose that we have a procedure that consists of completing two steps, that there are k possible ways to complete the first step, and for each possible way of completing the first step, there are m possible ways to complete the second step. Then there are k⋅m ways to complete the procedure.

As an example, the number of possible strings of 2 characters with the first character being one of the 26 uppercase English letters and second character being one of the 10 Hindu-Arabic numerals is 260 = 26⋅10.

In this case, 26 and 10 are small integers so you can list all possibilities by arranging them in a table as shown in the image. What is important to notice is that this same technique (multiplying k and m) will work for much larger values of k and m, when creating such a table may be either not helpful or impossible.

Challenge

Suppose that a U. S. state uses passenger vehicle license plate numbers that are 7 characters long, where the first character is one of the 9 nonzero Hindu-Arabic numerals, the second, third, and fourth characters are uppercase English letters, and the fifth, sixth, and seventh characters are Hindu-Arabic numerals. How many different possible passenger vehicle license plate numbers can the state create?

Click on "Hint," "Another challenge," and "One last challenge" to reveal the hidden text.

Hint

You can apply the multiplication rule more than once if the procedure you are trying to complete requires more than two steps.

Another challenge

The previous challenge was a bit simplified. In fact, the state places additional restrictions on the characters. To avoid confusing "0" with "O" or "Q", and "1" with "I", the state does not allow "O", "Q", and "I" as the second or fourth character on the license plate. With that change, how many different possible passenger vehicle license plate numbers can the state create?

Hint

You can still apply the multiplication rule more than once to answer this challenge question.

One last challenge

Both of the previous challenges were simplified. In fact, in addition to the restrictions on "O", "Q", and "I," the state does not use 1SWD000 through 1TZZ999, 1WAA000 through 1YZZ999, strings whose first four characters are 1UAA through 1VZZ, 1ZZA through 1ZZZ, or 3ZZA through 3ZZG for its passenger vehicle license plate numbers.
With these additional restrictions, how many different possible passenger vehicle license plate numbers can the state create?

Hint

You may want to use some addition and/or subtraction as well as the multiplication rule.

The Pigeonhole Principle states that if $n$ is a positive integer and $n+1$ objects are going to be assigned to $n$ categories then one of the categories must be assigned at least 2 of the objects. Click here to see a commonly-used photoshopped image that illustrates this principle.

A function from set D to set C is a rule that assigns to each element in D (that is, to each input value) exactly one output value from C. We also say each input value is mapped to the output value. A much more formal definition will be given in the Functions chapter of the textbook, but for now it is enough to understand functions in this way.
- The rule may be represented by a mathematical equation, a verbal description, a table of paired values, a plot of points, … or even code 😎!
- The set D of all input values is called the domain of the function.
- The set C that contains the output values is called the codomain of the function.
- The range of the function is the set that contains only the output values and no other elements. The range is always a subset of the codomain C, but may not contain every element of C. That is, some elements of C may not be outputs for the function. It is often important to distinguish the range from the codomain; this is discussed in detail in the Functions chapter.

Examples - Functions

Here are some examples of functions with their rules, domains, codomains, and ranges.

Any one-to-one correspondence between two sets A and B can be viewed as a function $f$ from A to B.
- The rule for $f$ is given by the pairing of elements in A with elements in B.
- The domain of $f$ is the set A.
- The codomain of $f$ is the set B.
- The range of $f$ is the set B, too, because every element in B is paired with an element in A by the one-to-one correspondence.
  
  Notice that the one-to-one correspondence can be used to define another function g from B to A with domain B, codomain A, range A, and rule given by the pairing of elements in B with elements in A. The two functions $f$ and g are called inverse functions because $f$ and g "invert" or "undo" each other. Inverse functions will be discussed in more detail in the Functions chapter.

The floor of x, $\lfloor x \rfloor,$ and the ceiling of x, $\lceil x \rceil,$ are two functions defined for all real numbers as follows:
- $\lfloor x \rfloor$ is the greatest integer less than or equal to x. For example, $\lfloor -1.5 \rfloor = -2$ and $\lfloor 6.3 \rfloor = 6$
- $\lceil x \rceil$ is the least integer less than or equal to x. For example, $\lceil -1.5 \rceil = -1$ and $\lceil 6.3 \rceil = 7$
- The domain of both the floor and ceiling functions is the set of all real numbers, $\mathbb{R}.$
- The range of both the floor and ceiling functions is the set of all integers, $\mathbb{Z}$ (NOTE: The German word for "numbers" is Zahlen, which is why the letter $\mathbb{Z}$ is used for the integers.)
- The codomain of these two functions can be chosen depending on the context. If we only need to consider output values we could choose the codomain to be the same set as the range, $\mathbb{Z},$ but if we want to plot these two functions in the xy coordinate plane we would choose the codomain to be the set of real numbers, $\mathbb{R}.$ This is discussed in the Functions chapter.
  
  The floor and ceiling functions are discussed in more detail in Appendix: Library of Functions.

In Python, a list can be used to represent a function with inputs the valid integer indices for the list and outputs the values stored in the list. In an earlier example, we defined list L to be ["Discrete", "Mathematics"] and then evaluated L[0] and L[1] to access the strings stored in the list.
Note for programmers: In reality, the items in the list are references to objects that implement the two strings. That is, the list items are neither the strings nor the objects that implement strings but references to those objects, which are located elsewhere in memory.
The rule "Return the first character of the input Unicode string."
- The domain is the set of all strings of length greater than or equal to 1 that contain Unicode characters.
- The range is the set of all Unicode characters.
- We would need to decide what the codomain should be in this context: It could be the same as the range, or we could use the larger set of all strings that contain Unicode characters (including the empty string "".)
$f(x) = x^{2}$ from the set of real numbers to the set of real numbers.
- The domain and codomain are both the set of real numbers.
- The range is the set of nonnegative real numbers.
$g(x,\,y) = xy + y$ where x and y are real numbers.
- The domain is the set of ordered pairs of real numbers.
- The range is the set of all real numbers. The codomain is the same as the range.
See the Appendix: Library of Functions for other functions you should be familiar with.

Example - Functions

The code below shows two functions that are user-defined in Python.

Click on the "Next" button to step through the code.

Edit in PythonTutor

Here is another example.

Example - Functions and Data Types

What do you get when you "add" a Python object to itself?

Note that the answer depends on the object’s data type.

Click on the "Next" button to step through the code.

Edit in PythonTutor

A sequence is a function from the natural numbers, or a subset of the natural numbers, into another set (e.g., the natural numbers, or the real numbers, or a nonnumerical set.) For example, we can define a sequence by the rule \[a_{i} = 2i+1 \text{ for every natural number } i \] which describes the sequence of positive odd integers $a_{0} = 1,$ $a_{1} = 3,$ $a_{2} = 5,$ and so on.
- NOTE 1: It is common to use i as a variable in sequence notation because i is the initial letter of the word index. This i has nothing to do with the complex number $\sqrt{-1}.$ Mathematicians recycle and reuse letters!
- NOTE 2: It is traditional to write the input variable for a sequence as a subscript instead of putting it between parentheses. In the preceding example, $a_{i} = 2i+1$ has the same meaning as $a(i) = 2i+1$.
- NOTE 3: The output values of a sequence are called terms. For example, the 0th term of the sequence is $a_{0} = 1,$ the 1st term of the sequence is $a_{1} = 3,$ and so on.

A finite sequence is a sequence that is defined for only a finite subset of $\mathbb{N}.$ That is, the set of input $i$ values that make sense for the sequence is a finite set.
For example, a child counting up to five on the fingers of one hand is defining the sequence called $\textit{fingers}_{i}$ that is represented by the image.
Technically, the sequence $\textit{fingers}_{i}$ shown in the image is the inverse of the child’s actual counting sequence. Because the child assigns to each finger "input" exactly one number "output," the arrows would point up from a finger to the corresponding number.

The sequence $\textit{fingers}_{i}$ can be written, formally, as \begin{equation} \begin{aligned} \textit{fingers}_{1} {} & = \text{"Thumb"} \\ \textit{fingers}_{2} {} & = \text{"Index Finger"} \\ \textit{fingers}_{3} {} & = \text{"Middle Finger"} \\ \textit{fingers}_{4} {} & = \text{"Ring Finger"} \\ \textit{fingers}_{5} {} & = \text{"Pinky Finger"} \\ \end{aligned} \end{equation} but it is much more common to list the terms of the sequence in order: "Thumb," "Index Finger," "Middle Finger," "Ring Finger," "Pinky Finger."

Notice that Java arrays and Python lists are implementations of the mathematical concept of a finite sequence where the domain is the set of $i$ values $\{0, 1, \ldots , n \}$ for some natural number $n.$

An infinite sequence is a sequence that is not a finite sequence, that is, the there are infinitely many $i$ values that make sense as inputs for the sequence. For example the sequence \[ \text{isOdd}_{i} = \begin{cases} \text{1} & \text{ if } i \text{ is odd} \\ \text{0} & \text{ if } i \text{ is even} \\ \end{cases} \] is an infinite sequence with domain the integers that has only two output values.

A bitstring is a finite sequence of the bits 0 and 1. Bitstrings are written as a string of 0s and 1s without spaces or commas between the terms of the sequence; for example, 01101011 is a bitstring of length 8. Bitstrings can be used to represent a sequence of answers to "Yes-No" or "True-False" questions, with "1" representing "Yes" or "True" and "0" representing "No" or "False." Bitstrings can also be used to represent numbers in binary notation, which will be discussed in the Number Bases chapter.

Summation notation is a "shortcut" used to abbreviate a sum of a finite sequence of numbers, called addends, when the sequence contains a large number of addends.
As an example, the sum $1+3+5+7+9+11+13+15+17+19$ is abbreviated as $\sum\limits_{i=0}^{9}(2i+1)$.
As another example, the sum of the first $500$ positive odd integers, $1+3+5+\ldots+995+997+999$, is abbreviated as $\sum\limits_{i=0}^{499}(2i+1)$.
- NOTE 1: The variable i used in summation notation is called the index of summation and the symbol $\sum$ is the capital Greek letter "sigma." To compute the value of the sum, you generate the sequence of addends by substituting each integer value, starting with the lower limit written below the sigma and stopping at the upper limit written above the sigma, for i into the algebraic expression or function written to the right of sigma, then find the sum of all the numbers in the sequence.
- NOTE 2: "Infinite sums," more properly called infinite series, are not discussed in the Remix. The sum of an infinite series is defined as the limit of its sequence of partial (finite) sums, and "limits" is a topic from continuous mathematics, not discrete mathematics.
  Another use of infinite sum notation is to represent the generating function of a sequence, which is discussed in some discrete mathematics textbooks but not in the Remix. If you want to learn about generating functions, you can read about them in Oscar Levin’s Discrete Mathematics: An Open Introduction, 4th edition.

Recursion is a process that defines an object, or computes a value, or describes the construction of an object or set of objects, using steps that refer to one or more previously completed steps.

Example - A Recursively-Defined Function

In this example, a Python function is defined recursively. The function takes any natural number input n (represented as an int in Python) and returns a value that we claim is $5^n$.

Click on the "Next" button to step through the code.

Notice that each time the loop executes, a new instance of the function is created.

Edit in PythonTutor

Later in the textbook, you will be able to prove that the power_of_5 function must return $5^n$ for every natural number input n using a proof technique called mathematical induction.

Recurrence relations consist of one or more equations that define a sequence or a function with domain $\mathbb{N}.$

Examples - Recurrence Relations

The following examples show how to define a sequence from $\mathbb{N}$ to $\mathbb{N}$ using recursion. Notice that for each of the sequences we can compute the output value corresponding to any input value by repeatedly using the recurrence that relates a term to its preceding term in the sequence.

$b_{0} = 3$ and $b_{i+1} = b_{i} + 2$ for all natural numbers i.
$c_{0} = 3$ and $c_{i+1} = 2 c_{i}$ for all natural numbers i.

As an example, we can use the recurrence relations to compute $b_{1}$ and $c_{1}$ as follows. \[b_{1} = b_{0} + 2 = 3 + 2 = 5\] \[c_{1} = 2 c_{0} = 2 \cdot 3 = 6\]

Challenge

A closed form for a sequence is a formula that lets you find the value of any term of the sequence by computing directly with the index i. In an earlier example, we had a sequence defined by the closed form $a_{i} = 2i+1$ for every natural number i: You can compute any term of the sequence by substituting directly a natural number value for the index i into the closed form, for example, $a_{8} = 2 \cdot 8+1 = 17.$

The challenge is to find closed forms for the two sequences in this example.

Use the following steps.

First, make a table of values that shows the value $i_,$ $b_{i},$ and $c_{i},$ for each natural number i that is less than or equal to 8.

Secondly, make a conjecture (that is, a guess based on the values in the table) for the closed forms of the two sequences.

Thirdly, verify that the conjectured closed forms give the correct results for each of the natural numbers i that is less than or equal to 8. Notice that this does not show that the closed forms are correct for much larger natural number values for i such as 100 or 1,000,000. A method for validating the closed form for all natural numbers i will be introduced in the Proofs: Mathematical Induction chapter.

Hint

Look for patterns in the numbers in the table of values you made.

Help!

You may want to review arithmetic sequences and geometric sequences here.

In English, there are four types of sentences, depending on what is being communicated: statements (or declarative sentences), commands, exclamations, and questions. A proposition is a statement that declares a fact that is either True or False (but not both!) In mathematics, we are usually most interested in analyzing and verifying propositions.

A predicate is an incomplete proposition that contains one or more variables that need to be filled in to complete the proposition. One example of a predicate is "My major is $\rule{12mm}{.5pt}$." Notice that this becomes a proposition once the blank, which represents the variable in this case, has been filled in.
Another example of a predicate is "The positive integers m and n are prime numbers." Again, this becomes a proposition once values are substituted for the two variables.
In this textbook, predicates will often be written in a way similar to functions: + \[ P(m, n) : \text{"The positive integers } m \text{ and } n \text{ are prime numbers."} \] Notice that the output of the predicate is a statement but the output does not tell us whether the statement is True or False - think of this like a programmer: The return value is a string, not a Boolean.
Two predicates are equivalent if for every possible substitution for the variables, the statement produced by the first predicate is true if and only if the statement produced by the second predicate is true.

An algorithm is a finite sequence of commands and statements that describe a process for completing a task.
It’s likely that you have learned how to perform many algorithms in your previous mathematics education, but in this textbook (and in computer science in general) it is more important that you learn how to analyze and compare algorithms.
One example is the following (correct but inefficient) algorithm for division of positive integers.
- Task: Given two positive integers a and b, compute the quotient q and remainder r so that
  $a = q \cdot b + r$ and $0 \leq r < b.$
- Input: Two positive integers a and b
- Steps:
  1. Set r equal to a and set q equal to 0.
  2. If r is greater than or equal to b
    
    set r equal to r - b
    
    add 1 to q
  3. If r is greater than or equal to b then repeat step 2
- Output: Integers q and r such that both $a = q \cdot b + r$ and $0 \leq r < b.$
  - q is the quotient, that is, the number of times each of the two assignments under step 2 was executed.
  - r is the remainder, that is, the result of the last execution of step 2, so $r = a - q \cdot b.$

Example - Division of Integers by Repeated Subtraction

The code below implements integer division for positive integers a and b.

Click on the "Next" button to step through the code.

Notice that each time the loop executes, the code prints an equation that shows that a is the sum of a whole number times b and a remainder r. The loop terminates when we compute a value for the remainder r that is both less than b and greater than or equal to 0.

Edit in PythonTutor

Question

In the code, $a = 13$ and $b=3.$ How many times does the block of code within the loop execute?

Hint

You can answer this by stepping through the code using the "Next" button.

Question

How many times does the block of code within the loop execute if $a = 13$ and $b=6?$ If $a = 13$ and $b=9?$

Hint

You can answer this by editing the code, changing the value of b, and stepping through the code using the "Next" button.

Question

In the code suppose that $a = 13$ and that $b$ can be assigned any positive integer value that is less than or equal to 13. Let’s say that the worst-case behavior for inputs of the form $(13, \, b)$ is the maximum number of executions of the block of code within the loop that occurs for one of these inputs. What value(s) of b correspond to the worst-case behavior, that is, what value(s) of b correspond to the maximum number of executions of the block of code within the loop for all inputs of the form $(13, \, b)$ where b is a positive integer value that is less than or equal to 13?

Hint

You could answer this by editing the code, changing the value of b to values other than 3, and stepping through the code using the "Next" button. BUT, it may be faster if you use reasoning about the value(s) of b instead.

Question

How many times does the block of code within the loop execute if $a = 130$ and $b=3?$ If $a = 299$ and $b=3?$

Hint

You can answer this by editing the code, changing the value of a, and stepping through the code using the "Next" button. BUT, it may be faster if you use reasoning about the value(s) of a instead.

Challenge

Now suppose that, in the code, the ordered pair $(a, \, b)$ of variables can be assigned any ordered pair of positive integer values with a greater than or equal to b. Find a formula for a worst-case complexity function $W(a)$ that assigns to each positive integer input a the output that is the maximum number of executions of the block of code within the loop for all positive integer pairs $(a, \, b).$

Hint

Try to form a conjecture by editing the code, changing the value of a and then holding a constant while using various values of b that are less than or equal to a, then stepping through the code using the "Next" button. Refer back to your answers to the previous questions, too.

Challenge

The algorithm implemented in the code is correct but not very efficient. You probably learned how to do division by hand in elementary or middle school. Use your knowledge of how to do division by hand to (1) change the Python code to be more efficient (and still correct!) then (2) determine the worst-case behavior as the maximum number of times the block of code within the loop will execute, in terms of the variables a and/or b with a greater than or equal to b.

Challenge

For any integer a and any positive integer b, we can compute integers q and r so that both $a = q \cdot b + r$ and $0 \leq r < b.$ What changes are needed to the algorithm to compute q and r correctly if a is zero or negative?

Hint

Consider what changes are needed to the loop condition and the computations within the loop. Use the print statements to help you see what changes are needed.

Based on the case $b=2$ in the division algorithm discussed above (and the Challenge), every integer a can be written in the form $a = q \cdot 2 + r$ where q is an integer and $0 \leq r < 2.$ The integer a is even if $r=0$ and is odd if $r=1.$ So we have a precise formal way of understanding and discussing odd and even integers - this may seem unnecessary (or even completely silly), but as you continue reading this textbook you will see that precise formal definitions and descriptions are useful when you need either to justify that certain statements are true or to validate that certain processes always produce correct and expected results.

Suppose that $a$ and $b$ are integers, which could be positive or negative or zero. The integer b is called a factor of a (or divisor of a), and a is called a multiple of b if $a = q \cdot b$ for some integer q. For example, 2 is a factor of 10, and 10 is a multiple of 2, because $10 = 5 \cdot 2.$ As another example, $-2$ is a factor of 10, and 10 is a multiple of $-2$ because $10 = (-5) \cdot (-2).$
A positive integer n that is greater than 1 is called prime if the only positive integer factors of n are 1 and n itself, and is called composite otherwise. For example, 2 is prime since its set of positive integer factors is $\{ 1,\,2 \}$, but 6 is composite since its set of positive integer factors is $\{ 1,\,2,\,3,\,6 \}$.
- NOTE: The integer 1 is considered neither prime nor composite. The reason for this is beyond the scope of this textbook but would be discussed in a more advanced math course in ring theory.

Two integers a and b are called relatively prime if the only common positive integer factor of a and b is 1; this is equaivalent to stating that the two integers do not share any prime factors. For example, 10 and 21 are relatively prime integers.

A relation on the sets A and B is an association between elements from set A and set B; A and B are often the same set. Relations will be defined much more formally and precisely in the Relations chapter of the textbook.
Here are some examples of relations:
- The ordering relation "is less than," $x < y,$ for real numbers x and y. So $3 < 4$ but $5 \nless 4.$ The slash through the "<" symbol means that "5 is not related to 4" in the way we want.
  The orderings $>$, $\geq$, and $\leq$ are also examples of relations.
- The equality relation $s=t$ for any elements s and t of the same set A.
  A related example is inequality, $s \neq t.$
- The divisibility relation "a is a divisor of b" (or "a divides b") for integers a and b; this is sometimes written as $a \mid b.$ So for example, $2 \mid 4$ but $3 \nmid 4.$
- For two integers a and b, we say that "a has the same parity as b" if either both a and b are odd or both a and b are even.
- Any function $f$ with domain A and codomain B is a relation since the function associates each element a in A with exactly one element b of B, namely $b = f(a)$.
- A relation can also involve more than two sets. As an example, imagine a database of records that has three fields: a student’s name, a student’s college identification number, and the student’s major. The database can be viewed as a set R of ordered triples. So, for example, if a student named Chris Garcia has identification number 900123001 and is a Computer Science major, the set R would contain as an element the ordered triple ("Chris Garcia", 900123001, "Computer Science").

A graph is a mathematical object that consists of vertices (also called nodes) that are connected by edges. Graphs are often represented by drawings like the ones shown in the following examples, but you can also represent a graph in other ways that are easier and more efficient to use in code; this will be discussed in the Graphschapter.

The drawing of a graph is not treated like a geometric polygon: The only two points "on an edge" are the edge’s endpoints. Edges are just connectors between vertices and points that are not indicated as endpoints of an edge are ignored. Also, in a drawing of a graph, the lengths of edges and the straightness or curvedness of edges are not important, just the connections between the edges' endpoints.

Some high school-level textbooks use the term vertex-edge graph to distinguish this type of graph from graphs (that is, plots of points in the $xy$-plane) for equations, functions, or statistical data.

Example - Two Drawings of One Graph

Keep in mind that a graph is NOT the same as a drawing of the graph. In fact, a graph can usually be drawn in many different ways that may look very different. What is important is the connections, represented by the edges, between pairs of vertices.

In the image, two different drawings are shown for the same graph. Notice that in each drawing, the connections between pairs of vertices are the same: The only pair of vertices that is not connected by an edge is $\{ C, D \}.$

Also notice that there is no vertex drawn where the two edges cross in the 1st drawing on the left, so this graph has 4 exactly vertices: $A,$ $B,$ $C,$ and $D.$

Example - A Network Of Students

A graph can represent relationships between pairs of people.

Here is a graph that represents whether pairs of students are enrolled in at least one class together. Each of the 7 vertices represents a student, and each of the 7 edges represents a pair of students who are enrolled in a class together. The graph indicates that Adil and Elias are enrolled in at least one class together and that Elias and Maya are enrolled in at least one class together.

Question

Are all three of the students Adil, Elias, and Maya enrolled in at least one class together?

Hint

Two students are enrolled in at least one class together if and only if there is an edge connecting the vertices labeled by the two students' names.

Question

Are all three of the students Sofia, Elias, and Jun enrolled in at least one class together?

Hint

Can you imagine two different scenarios, one where the answer is "Yes" and another where the answer is "No"?

Question

The vertex for Li has degree 2 because it occurs as an endpoint of an edge 2 times. Can you determine how many classes Li is enrolled in from the graph?

Hint

Remember that an edge indicates that the pair of students at the endpoints are enrolled in at least one class together.

Question

Chris is represented by an isolated vertex that is not the endpoint of any edge (so that vertex has degree 0.) Does this mean that Chris is enrolled only in Independent Studies classes with no other students?

Hint

Think carefully about what an edge represents in this graph.

Examples - Complete Graphs and Star Graphs

Here are some other examples of graphs.

A complete graph is a graph in which every pair of distinct vertices are the endpoints of exactly one edge. The image shows the complete graph on 4 vertices. Notice that two edges appear to "intersect" but there is no vertex drawn where the edges cross, so these edges do not have any points in common - as stated above, an edge contains only its endpoints.

A star graph is a graph that has one central vertex that is one of the endpoints of every edge in the graph. The image shows the star graph on 6 vertices. The star graph is one example of a tree, a graph in which for every pair of distinct vertices there is exactly one path of edges that can be used to connect the vertices. Some of the many applications of trees in computer science will be discussed in the Trees chapter.

The design of this book is to introduce each concept informally, as was done for the preceding foundational ideas, then notice properties and patterns, generalize from what has been noticed, and formalize the ideas to prepare for even deeper analysis.

And congratulations if you read through all of those foundational mathematical ideas in this subsection and worked through all the Questions and Challenges! If you compare the list of ideas to the Table of Contents, you will see that you have touched on every one of the topics that will be discussed in this textbook!

1.2.3. On-Demand Math Resources and Library Of Functions Appendices

Two appendices to this textbook contain additional mathematics that you may need to review as you work your way through the textbook.

1.2.4. Do I Need To Know How To Program In Python?

You are NOT expected to know the Python programming language before you start this course.

As you’ve seen above, this textbook contains Python code snippets that are designed to aid your understanding of the mathematical concepts. It is NOT one of the goals of this textbook to teach you Python, but instead "just enough Python" to be able to examine, run, and alter the existing code snippets.

The appendices "An Introduction to Python" and "Python Syntax Examples" cover most of the basic concepts you will need from the Python programming language.

1.3. Applications of Discrete Mathematics

Remixer’s Note: This section is taken from the original “Discrete Math” book, with only a few minor edits.

Discrete mathematics is applied in many areas including the physical, engineering, and increasingly, the social sciences.

1.3.1. Applications to Applied Mathematics

Most problems that involve computational methods, need to be solved using computers. Rather than solve for the temperature map of an entire planar region, we solve for the temperature using a discrete set of mesh or grid of points on a representative subset of the planar region.

Figure 1. Continuous temperature profile versus discrete meshed representation on computer

1.3.2. Applications to Information Technology and Computer Science

Discrete mathematics is needed for computer science as information and data is stored digitally. Digitally represented data is inherently discrete and is processed using discrete methods. For example a course grid discrete representation of the 2-d temperature distribution from the plate above could be:

$ \left(\begin{matrix}1&1&1\\2&4&8\\3&9&27\\4&16&64\\5&25&125\\\end{matrix}\right) $

A voter registry may have voters in a database accessible from a list:

$ \left(\begin{matrix}John\ Smith\\Raheem\ Johnson\\.\\.\\.\\Sarah\ Muller\\\end{matrix}\right) $

Which may need to be accessed and sorted, say geographically or alphabetically.

1.3.3. Applications to Data Science

Data science solutions to many problems use machine learning algorithms that are inherently discrete in nature. The information that needs processing is discrete, as are the basic problems in data science such as classification or clustering problems. In particular

Information consisting of data sets is represented using various data structures including graphical structures such as trees. Data science methods and algorithms involve procedures that manipulate these graphical structures to, for example, networks, classification trees, and decision trees.
Classification problems are discrete in nature. Classifying tumors as malignant or as benign involves trying to predict if a variable $Y$ that we can think of as taking on two values either $0$ or $1$ either malignant or benign. There are various algorithms used in classification problems, such as the binary tumor classification, including methods from probability.

Figure 2. Binary classification algorithm ("1" malignant, "0" benign)

1.3.4. Applications to Engineering

Digital signal processing involves taking a video, audio, or other signal like temperature, pressure, position and velocity, which is continuous, digitizing it and then processing the digital signal mathematically.

Figure 3. Continous vs discrete time signal

1.3.5. Applications of Combinatorics

Combinatorics involves in part the study of counting the number of objects, satisfying a specified condition, from sets of variable size. Enumeration and combinatorics is important in many areas and examples including:

Calculating the number of steps an algorithm needs to process a data set of variable size $𝑛$. This problem is called the computational cost of the algorithm as a function of $𝑛$.
Calculating the possible number of codes in a cryptographic code system

1.3.6. Applications of Graph Theory

Graph theory, which is the study of structures constructed with nodes and the edges joining them, has applications in many fields including,

Chemistry - representing molecular bonding and structure

Figure 4. Graph theory and molecular bonding

Information technology and computer science - ranking pages on the internet, with pages considered as nodes and page links as edges.

Figure 5. Page ranks using a graph theory model.

Industrial engineering and network optimization
- Traffic routes (computer, internet, air, highway, subway systems) can be represented with stations as nodes and connections as edges.
- Often we are interested in finding an optimal path in a network such as in the following example, finding the shortest tour over a series of towns on a map.

An example of the shortest tour problem, is shown below, using a software solution.

Figure 6. Using software like Mathematica to solve a network optimization problem such as finding the shortest tour.

1.3.7. Applications of Probability and Statistics

Many probability assignments are based on counting and combinatorial methods.

If we assume that the likelihood of rain is the same on any day in the month of September, we might be interested in the probability that it rains on $0$ days, it rains on exactly $1$ day, exactly $2$ days, etc. Such probability assignments are called discrete distributions, by contrast with continuous distributions like the bell curve.
Also probability and statistical techniques are often used in data science. The binary classification problem, of say classifying a tumor as malignant or benign, uses a statistical modeling technique, called regression, specifically logistic regression to determine the strength of the relationship between the independent variable, and dependent heterogeneity variable. In the tumor grading example the independent variable would be $(x_1,x_2 )$ (elastic heterogeneity, nonlinear elasticity), and the dependent variable would be $Y$, classified as $0$, or $1$, (malignant or benign).

1.3.8. Applications to Social Sciences

Discrete mathematical techniques are important in understanding and analyzing social networks including social media networks.

The mathematics of voting is a thriving area of study, including mathematically analyzing the gerrymandering of congressional districts to favor and/or disfavor competing political parties. The following example illustrates some of the fundamental ideas related to gerrymandering.

Example—Mathematics and Voting

Consider a fictitious state made up of $10$ congressional districts with $7$ thousand voters in each district. To win a district a party (Green or Blue) needs to win $4$ thousand or more votes. Consider the following two districting map scenarios. In each scenario, the blue party earns $28$ thousand votes, and the green party earns $42$ thousand votes. In scenario $A$, the blue party wins $2$ out of $10$ districts, but in scenario $B$ it wins $7$ out of $10$ districts.

Figure 7. Gerrymandering example with two equivalent votes

1.4. Links to the Informal Definitions in this Chapter

For your reference in the future, here are direct links to most of the bullet points in the list of foundational ideas discussed in this section.

Sets, including subsets, the empty set, unions of sets, and intersections of sets

One-to-one correspondence, including the example "rock, paper, scissors"

Natural Numbers

Finite and infinite sets

Counting, including the Multiplication Principle and Pigeonhole Principle

Functions, including domain, codomain, range, inverse function, and the floor and ceiling functions $\lfloor x \rfloor$ and $\lceil x \rceil$

Sequences, including index and terms

Bitstrings

Summation Notation

Recursion

Recurrence Relations, including closed forms

Propositions

Predicates

Algorithms, including the Division Algorithm

Relations

Graphs and trees

2. Number Bases

This chapter was last updated on April 1, 2025.

It’s likely that you learned how to represent positive integers using the base-ten place-value system when you were young. This system uses ten symbols, the Hindu-Arabic numerals ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, and ‘9’, to represent any natural number.

In the base-ten place-value system that uses decimal notation, each natural number is represented by a numeral which is a string of one or more of the ten Hindu-Arabic digits. However, in some computer science contexts, it is more useful to represent natural numbers in a place-value system that uses a different number base such as binary (base-two), octal (base-eight), or hexadecimal (base-sixteen). Using or thinking about numbers represented by numerals in these other systems can help you develop more efficient algorithms and recognize when certain numerals can be interpreted as encodings of multiple pieces of information.

In this chapter, you will learn how to represent natural numbers using place-value systems with bases other than ten.

Key terms and concepts covered in this chapter:

Number bases
- binary
- hexadecimal
- octal
- decimal

2.1. Numbers, Numerals, and Digits

In everyday life, it is common to treat numbers and numerals as if they are the same. However, it is important in this chapter to distinguish these concepts.

A number is an abstract idea or mental concept of a count, a measure, a rank in an ordering, etc..
A numeral is a word, phrase, symbol, or string of symbols that is used to represent a number.
A digit is a single symbol that represents a number. A digit is a numeral, but multiple digits can be combined to create other numerals. For example, each of the ten Hindu-Arabic numerals is a digit that represents a natural number that is less than ten, and multiple Hindu-Arabic numerals can be combined to create numerals that represent numbers that are greater than or equal to ten.
NOTE: The word "digit" comes from the Latin word digitus which means "finger" or "toe."

Example 1 - Numerals, Numbers, and Digits

Consider the following words:

"five",
"cinco",
"خمسة" ("khamsa"),
"पाँच" ("paanch"),
"五" ("wǔ").

These words, taken from different languages, are examples of numerals, words that represent a number. In fact, all those numerals represent the same number, but no one of the numerals is the number. The number itself is an abstract idea that can be referred to using any of those numerals.

In the same way,

the Roman "Ⅴ",
the Braille "⠼⠑",
the Coptic Epact "𐋥",
the Eastern Arabic "٥",
the Western Arabic "5".

are other ways of representing the same number that the words above represent.

Semiotics

In semiotics, the study of signs, symbols, and signification, the number is a sign, made up of the signified mental concept and the signifier (the numeral.) Further discussion is beyond the scope of this textbook, and beyond the remixer’s expertise! Here is a link if you want to learn more about semiotics.

This is not a pipe!

Read this Wikipedia article and pay attention to the quote from the artist, René Magritte.

2.2. Review Of The Base-Ten Place Value System

It’s likely that everything you are about to read in this section is knowledge you’ve had for many years. The purpose of stating everything so explicitly is to provide an example you can compare with place-value systems that use other bases.

A base-ten numeral is a string formed from one or more digits (i.e., Hindu-Arabic numerals.)

The string is read from left-to-right.
Each digit in the string represents a multiple of a power of the base, ten, depending on the its position in the string.
The rightmost place represents a multiple of 1 (which is ten raised to the power zero) and each of the other places represents a multiple of a power of ten that is one greater than the power of ten represented by the place to its right.
Notice that the base itself, the number ten, is represented by the string "10" in this place-value system. The string "10" represents the number described by the phrase "1 ten plus 0 ones".
As an example, the string "101" represents the number described by the phrase "1 hundred plus 0 tens, plus 1 ones," where 1 hundred is the same as ten tens. That is, \[ 101 = 1 \cdot 10 \cdot 10 + 0 \cdot 10 + 1 \cdot 1 \] The expression on the right-hand side of the previous equation is referred to as expanded form in school-level mathematics.

NOTE: The Hindu-Arabic numerals evolved from earlier Brahmi versions. You can look at the image at this web page which shows a few steps in this evolution. Also, you can learn more about the history of the Hindu-Arabic numerals from the links in the "Notes" and "References" sections of this Wikipedia page.

2.2.1. An Algorithm That Computes The Digits Of A Base-Ten Numeral

In this subsection, an algorithm is presented for computing the digits in the expanded form of a base-ten numeral of the natural number $n.$ This may seem to be a complicated way to do something very simple since we could just read off the digits, but the important thing to notice about this algorithm is that the role that ten plays can be played by any other positive integer constant greater than one! This means that this algorithm can be adapted to find the "digits" in the expanded form of a base-$b$ numeral for the number $n$ for any base $b$ we choose.

Task: Given the natural number n, compute an array of natural numbers $s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} $] so that each of $r_{0}, r_{1}, \ldots , r_{k}$ is represented by a single digit in base-10, and \[ n = r_{k} \cdot 10^{k} + \ldots + r_{1} \cdot 10^{1} + r_{0} \cdot 10^{0} \] where $k$ is the greatest natural number such that $10^{k} < n.$
- Input: The natural number $n$
- Steps:
  1. Set $a$ equal to $n$
  2. Set $s$ to the empty array (We will append the values $r_{0}, r_{1}, \ldots , r_{k}$ to the array $s$ as we compute them)
  3. Divide $a$ by 10 to find natural numbers $q$ and $r$ such that both $a = q \cdot 10 + r$ and $0 \leq r < 10.$
  4. Append $r$ to the end of array $s.$
  5. If $q \neq 0$
    
    set $a$ equal to $q$
    
    go to step 3
  6. Return the sequence $s.$
- Output: An array of natural numbers $s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} $] where each number is represented by a single digit, and \[ n = r_{k} \cdot 10^{k} + \ldots + r_{1} \cdot 10^{1} + r_{0} \cdot 10^{0}.\]

That is, we rewrite $n$ as $r_{0} + 10 \cdot (r_{1} + 10 \cdot (r_{2} + \ldots r_{k-1} + (10 \cdot (r_{k})) \ldots ))$

Example 2 - Finding The Digits Of A Base-Ten Numeral

The following equations summarize how the preceding algorithm determines the digits in the base-ten expanded form numeral for the number 432.

\begin{equation} \begin{aligned} 432 {} & = 43 \cdot 10 + 2 & q {} & = 43 & r {} & = 2 & s & = [2] \\ 43 {} & = 4 \cdot 10 + 3 & q {} & = 4 & r {} & = 3 & s & = [2, 3] \\ 4 {} & = 0 \cdot 10 + 4 & q {} & = 0 & r {} & = 4 & s & = [2, 3, 4] \\ \end{aligned} \end{equation}

Notice that the items in $s = [ r_{0}, \, r_{1}, \, r_{2} $] are the numbers corresponding to the digits of the numeral $“432”$ in reverse order, so \begin{equation} \begin{aligned} 432 & = r_{2} \cdot 10^{2} + r_{1} \cdot 10^{1} + r_{0} \cdot 10^{0} \\ & = 4 \cdot 10^{2} + 3 \cdot 10^{1} + 2 \cdot 10^{0} \end{aligned} \end{equation}

Notice that the algorithm is rewriting $432$ as the sum $2 + 10 \cdot (3 + 10 \cdot 4)).$

2.3. The Base-Two Place Value System (Binary Notation)

This subsection describes the base-two (binary) place value system. You will see that much of what is written here is the result of replacing "ten" by "two" in the description of the base-ten (decimal) system in the previous section.

A base-two numeral is a string formed from one or more of the two binary digits (or bits) ‘0’ and ‘1’.

The string is read from left-to-right.
Each digit in the string represents a multiple of a power of the base, two, depending on the its position in the string.
The rightmost place represents a multiple of 1 (which is two raised to the power zero) and each of the other places represents a multiple of a power of two that is one greater than the power of two represented by the place to its right.
Notice that the base itself, the number two, is represented by the string "10" in this place-value system. The string "10" represents the number described by the phrase "1 two plus 0 ones".
As an example, the string "101" represents the number described by the phrase "1 four plus 0 twos, plus 1 ones," where 1 four is the same as two twos. That is, \[ 101 = 1 \cdot 10 \cdot 10 + 0 \cdot 10 + 1 \cdot 1 \text{ (🤯: Wait… WHAT?!?) }\] Yes, this equation, which may appear to be written in the base-ten system, is correct in the base-two place value system, too! "10" is how the number two is represented in base-two notation!
As an analogy, the string "pie" signifies different things in English (a baked dessert) and Spanish (a foot.) You must take care to know which context you are working in!

It is traditional to use some extra notation to indicate when the strings "10" and "101" are not base-ten numerals to avoid confusion. In this textbook, numerals in any base other than ten will be written between a pair of parentheses followed by a subscript indicating the base. The subscript is written as a base-ten numeral. For example, we could rewrite the previous equation as \[(101)_{2} = (1)_2 \cdot (10)_2 \cdot (10)_2 + (0)_2 \cdot (10)_2 + (1)_2 \cdot (1)_2 \] which translates into base-ten as $5 = 1 \cdot 2 \cdot 2 + 0 \cdot 2 + 1 \cdot 1.$ We can also write $5 = (101)_2$ which is a way of saying that the base-ten numeral and the base-two numeral signify the same number. + NOTE: The reason we use base-ten numerals as the subscripts on numerals in other bases is because base-ten is so dominant: It is the "privileged" base, so we need to indicate when a different base is being used… and we don’t need to use the parentheses or subscripts if we are already working in base-ten.

The parentheses and subscript are not necessary if it is clear from the context that a numeral is not a base-ten numeral. For example, \[ \text{chmod 755 hello.txt} \] is a Unix/Linux command that changes the file permission bits (read, write, execute) of the file "hello.txt" for the file’s owner, the file’s group, and any other user. In this example, the string "755" is not a base-10 numeral, but is in octal (base-eight). Octal will be discussed later in the chapter. No subscript is used in the Unix/Linux command because it is natural to an experienced user of that operating system to use octal in the context.
In fact, the octal numeral "755" is used here as an encoding of three bitstrings, where each bitstring is of length 3; this idea is discussed in a later subsection of this chapter.

Also, we can omit the parentheses and subscripts if we want to tell a couple of "jokes:"

You are ready now to learn how to represent numbers using base-two numerals.

2.3.1. An Algorithm That Computes The Digits Of A Base-Two Numeral

In this subsection, an algorithm is presented for computing the digits in the expanded form of a base-two numeral of the natural number $n.$ This algorithm has been adapted from the one stated for base-ten in the previous section. Notice that all numerals used in this algorithm are base-ten numerals unless otherwise indicated.

Task: Given the natural number n, compute an array of natural numbers $s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} $] so that each of $r_{0}, r_{1}, \ldots , r_{k}$ is represented by a single digit in base-2, and \[ n = r_{k} \cdot 2^{k} + \ldots + r_{1} \cdot 2^{1} + r_{0} \cdot 2^{0} \] where $k$ is the greatest natural number such that $2^{k} < n.$
- Input: The natural number $n$
- Steps:
  1. Set $a$ equal to $n$
  2. Set $s$ to the empty array (We will append the values $r_{0}, r_{1}, \ldots , r_{k}$ to the array $s$ as we compute them)
  3. Divide $a$ by 2 to find natural numbers $q$ and $r$ such that both $a = q \cdot 2 + r$ and $0 \leq r < 2.$
  4. Append $r$ to the end of array $s.$
  5. If $q \neq 0$
    
    set $a$ equal to $q$
    
    go to step 3
  6. Return the sequence $s.$
- Output: An array of natural numbers $s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} $] where each number is represented by a single digit, and \[ n = r_{k} \cdot 2^{k} + \ldots + r_{1} \cdot 2^{1} + r_{0} \cdot 2^{0}.\]

That is, the algorithm rewrites $n$ as $r_{0} + 2 \cdot (r_{1} + 2 \cdot (r_{2} + \ldots r_{k-1} + (2 \cdot (r_{k})) \ldots ))$

Example 3 - Finding The Digits Of A Base-Two Numeral (Binary Notation)

The following equations summarize how the preceding algorithm determines the digits in the base-two expanded form numeral for the number 13.

\begin{equation} \begin{aligned} 13 {} & = 6 \cdot 2 + 1 & q {} & = 6 & r {} & = 1 & s & = [1] \\ 6 {} & = 3 \cdot 2 + 0 & q {} & = 3 & r {} & = 0 & s & = [1, 0] \\ 3 {} & = 1 \cdot 2 + 1 & q {} & = 1 & r {} & = 1 & s & = [1, 0, 1] \\ 1 {} & = 0 \cdot 2 + 1 & q {} & = 0 & r {} & = 1 & s & = [1, 0, 1, 1] \\ \end{aligned} \end{equation}

Notice that the items in $s = [ r_{0}, \, r_{1}, \, r_{2} , \, r_{3} $] are the numbers (in base-ten notation) corresponding to the digits of the numeral $“(1101)_2”$ in reverse order, so \begin{equation} \begin{aligned} 13 & = r_{3} \cdot 2^{3} + r_{2} \cdot 2^{2} + r_{1} \cdot 2^{1} + r_{0} \cdot 2^{0} \\ & = 1 \cdot 2^{3} + 1 \cdot 2^{2} + 0 \cdot 2^{1} + 2 \cdot 2^{0} \end{aligned} \end{equation}

The algorithm rewrites $13$ as $1 + 2 \cdot (0 + 2 \cdot (1 + 1 \cdot 2))).$
Again, notice that the items in the array $[1, 0, 1, 1$] are listed in reverse order, so $13 = (1101)_2$ where the base-ten numeral and the base-two numeral represent the same number, thirteen.

Here is a link to an alternate method of finding the base-two numeral for a number.

If you made it to this sentence without skipping any of the discussion above, congratulations! If you did skip some of the discussion, go back and try your best to understand what the algorithm in the previous example is computing: The array $s$ holds the digits, in reverse order of the binary notation for the number $n.$ Compare what is done in this algorithm to the one for base-ten in the previous section… they are computing the digits for a numeral, but in different bases. If you can understand this algorithm, you will likely understand the rest of the chapter.

2.4. The Base-$b$ Place Value System

If you made it here, you are ready to learn how to find, given any natural number $n,$ the numeral that represents $n$ in the base-$b$ place value system (It is assumed that the base $b$ is a natural number greater than or equal to 2.) You can compare the algorithm and example in this subsection to the ones in the preceding subsections for base-ten and base-two.

A base-$b$ numeral is a string formed from one or more digits out of a set that contains $b$ symbols, where each symbol is called a "base-$b$ digit."

The string is read from left-to-right.
Each digit in the string represents a multiple of a power of the base, $b,$ depending on the its position in the string.
The rightmost place represents a multiple of 1 (which is $b$ raised to the power zero) and each of the other places represents a multiple of a power of $b$ that is one greater than the power of $b$ represented by the place to its right.
Notice that the base itself, the number $b,$ is represented by the string "10" in the base-$b$ place value system. The string "10" represents the number described by the phrase "1 $b$ plus 0 ones".
As an example, the string "101" represents the number described by the phrase "1 b-_squared plus 0 _b, plus 1 ones." That is, \[ 101 = 1 \cdot 10 \cdot 10 + 0 \cdot 10 + 1 \cdot 1 \text{ (🤯: Again?!?) }\] Yes, this equation is correct, too, in the base-b place value system!

To avoid confusion, you can enclose each numeral in a pair of parentheses followed by the subscript $b$ to indicate the base, where $b$ is written as a base-ten numeral. For example, the previous equation can be written as \[(101)_{b} = (1)_b \cdot (10)_b \cdot (10)_b + (0)_b \cdot (10)_b + (1)_b \cdot (1)_b \] which translates into base-ten as $b^2 + 1 = 1 \cdot b \cdot b + 0 \cdot b + 1 \cdot 1.$

2.4.1. An Algorithm That Computes The Digits Of A Base-$b$ Numeral

This is an adaptation of the algorithm presented earlier for base-two. Notice that all numerals used in this algorithm are base-ten numerals unless otherwise indicated.

Task: Given the natural number n, and positive integer constant $b > 1$ compute an array of natural numbers $s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} $] so that each of $r_{0}, r_{1}, \ldots , r_{k}$ can be represented by a single digit in base-$b$, and \[ n = r_{k} \cdot b^{k} + \ldots + r_{1} \cdot b^{1} + r_{0} \cdot b^{0} \] where $k$ is the greatest natural number such that $b^{k} < n.$
- Input: The natural number $n$
- Steps:
  1. Set $a$ equal to $n$
  2. Set $s$ to the empty array (We will append the values $r_{0}, r_{1}, \ldots , r_{k}$ to the array $s$ as we compute them)
  3. Divide $a$ by $b$ to find natural numbers $q$ and $r$ such that both $a = q \cdot b + r$ and $0 \leq r < b.$
  4. Append $r$ to the end of array $s.$
  5. If $q \neq 0$
    
    set $a$ equal to $q$
    
    go to step 3
  6. Return the sequence $s.$
- Output: An array of natural numbers $s = [ r_{0}, \, r_{1}, \, \ldots \, r_{k} $] where each number is represented by a single digit in base-$b,$ and \[ n = r_{k} \cdot b^{k} + \ldots + r_{1} \cdot b^{1} + r_{0} \cdot b^{0}.\]

The algorithm rewrites $n$ as $r_{0} + b \cdot (r_{1} + b \cdot (r_{2} + \ldots r_{k-1} + (b \cdot (r_{k})) \ldots )).$ The result $s$ contains the numbers that let you write $n$ in base-$b$ notation.

2.4.2. Octal Notation (Base-8)

Example 4 - Finding The Digits Of A Base-8 Numeral (Octal Notation)

The following equations summarize how to determine the digits in the base-8 expanded form numeral for the number 100.

Note that for base-8 we use the eight digits ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, and ‘7’.

\begin{equation} \begin{aligned} 100 {} & = 12 \cdot 8 + 4 & q {} & = 12 & r {} & = 4 & s & = [4] \\ 12 {} & = 1 \cdot 8 + 4 & q {} & = 1 & r {} & = 4 & s & = [4, 4] \\ 1 {} & = 0 \cdot 8 + 1 & q {} & = 0 & r {} & = 1 & s & = [4, 4, 1] \\ \end{aligned} \end{equation}

Notice that $s = [ 4, \, 4, \, 1 $] are the numbers (in base-ten notation) corresponding to the base-8 digits of the numeral $“(144)_8”$ in reverse order. You can verify that $100 = 1 \cdot 8^{2} + 4 \cdot 8^{1} + 4 \cdot 8^{0}.$ This means that $100 = (144)_8.$

2.4.3. Hexadecimal Notation (Base-16)

Example 5 - Finding The Digits Of A Base-16 Numeral (Hexadecimal Notation)

The following equations summarize how to determine the digits in the base-16 expanded form numeral for the number 500.

Note that for base-16, we need sixteen digits! It is traditional to use the ten Hindu-Arabic numerals followed by the first six uppercase English letters as the digits: ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, and ‘F’. So $10 = (A)_{16},$ $11 = (B)_{16},$ $12 = (C)_{16},$ $13 = (D)_{16},$ $14 = (E)_{16},$ and $15 = (F)_{16}.$
Note: Some programming languages like Python use the lowercase letters 'a' through 'f' instead of the uppercase letters.

The remainders stored in the array $s$ are represented in base-ten notation, and will need to be replaced by the corresponding hexadecimal digits in the base-16 numeral for 500.

\begin{equation} \begin{aligned} 500 {} & = 31 \cdot 16 + 4 & q {} & = 31 & r {} & = 4 & s & = [4] \\ 31 {} & = 1 \cdot 16 + 15 & q {} & = 1 & r {} & = 15 & s & = [4, 15] \\ 1 {} & = 0 \cdot 16 + 1 & q {} & = 0 & r {} & = 1 & s & = [4, 15, 1] \\ \end{aligned} \end{equation}

As before, we have $500 = 1 \cdot 16^{2} + 15 \cdot 16^{1} + 4 \cdot 16^{0},$ which you can verify is true. To write the base-16 numeral for 500, you need to replace "15" in base-ten by $(F)_{16}.$ So $500 = (1F4)_{16}.$

2.4.4. A Theorem (To Be Proven Later)

We can summarize what the algorithm does as a mathematical theorem, though technically at this point, it’s only a conjecture, an educated guess based on a few cases that seem to indicate that the algorithm will always work. You will learn a technique that will prove the theorem by validating the algorithm for all choices of natural numbers $n$ and $b>1$ in the Proofs: Mathematical Induction chapter.

Theorem

Let $b$ be an integer greater than 1. Any positive integer $n$ can be expressed uniquely in the form \[n = r_kb^k + r_{k - 1}b^{k-1} + \cdots + r_1b^1 + r_0b^0,\]where $k$ is a nonnegative integer, $r_0,r_1,\dots,r_k$ are nonnegative integers less than $b,$ and $r_k \neq 0.$

2.5. Converting From Base-$b$ to Base-Ten

In this section you will learn how to rewrite a base-$b$ numeral in base-ten.

Example 6

What is the decimal expansion of the positive integer with base 7 expansion $(1063)_7$?

Solution

We have

\[\begin{split} (1063)_7 &= 1 \cdot 7^3 + 0 \cdot 7^2 + 6\cdot 7^1 + 3 \cdot 7^0\\ &=1 \cdot 343 + 0 \cdot 49 + 6 \cdot 7 + 3 \cdot 1\\ &= 343 + 0 + 42 + 3\\ &= 388. \end{split}\]

Several common bases used in computer science are base $2$, base $8$, and base $16$, which are referred to as binary, octal, and hexadecimal, respectively. Binary digits are often referred to as bits. Note that, when finding the hexadecimal expansion of a positive integer, in addition to the usual digits $0$ through $9,$ we require an additional 6 digits. We will represent these by the letters $\mathrm{A}$ through $\mathrm{F}$, where $(\mathrm{A})_{16} = 10,$ $(\mathrm{B})_{16} = 11,$ $(\mathrm{C})_{16} = 12,$ $(\mathrm{D})_{16} = 13,$ $(\mathrm{E})_{16} = 14,$ and $(\mathrm{F})_{16} = 15.$

Example 7 - Hexadecimal expansion

Find the decimal expansion of the positive integer whose hexadecimal expansion is $(5\mathrm{B}\mathrm{F})_{16}.$

Solution

We have

\[\begin{split} (5\mathrm{B}\mathrm{F})_{16} &= 5\cdot 16^2 + 11 \cdot 16^1 + 15 \cdot 16^0\\ &= 5\cdot 256 + 11 \cdot 16 + 15 \cdot 1\\ &= 1280 + 176 + 15\\ &= 1471. \end{split}\]

2.6. Base Conversion Among Binary, Octal, and Hexadecimal

One of the ways that octal (base-eight) and hexadecimal (base-sixteen) are used in computer science is to abbreviate long bitstrings. The following examples will show how this is done.

Suppose you need to convert a numeral from hexadecimal to binary. One method would be to first convert from hexadecimal to decimal, and then convert the result from decimal to binary. However, it is much more efficient to notice that since $2^4 = 16,$ you can express each hexadecimal digit as a block of 4 bits (that is, a bitstring of length 4) as follows:

\[\begin{array}{llll} (0)_{16} = (0000)_2 & (1)_{16} = (0001)_{2}& (2)_{16} = (0010)_2 & (3)_{16} = (0011)_2 \\ (4)_{16} = (0100)_2& (5)_{16} = (0101)_2& (6)_{16} = (0110)_2 & (7)_{16} = (0111)_2\\ (8)_{16} = (1000)_2& (9)_{16} = (1001)_2& (\mathrm{A})_{16} = (1010)_2& (\mathrm{B})_{16} = (1011)_2\\ (\mathrm{C})_{16} = (1100)_2& (\mathrm{D})_{16} = (1101)_2& (\mathrm{E})_{16} = (1110)_2& (\mathrm{F})_{16} = (1111)_2. \end{array}\]

You can then concatenate the blocks, and remove any leading zeros if you need to.

Example 8 - Hexadecimal to Binary Conversion

Find the binary expansion of $(4\mathrm{C}\mathrm{A}7)_{16}.$

Solution

Each hexadecimal digit can be replaced by a block of 4 bits:

\[\begin{array}{llll} (4)_{16} = (0100)_2 & (\mathrm{C})_{16} = (1100)_2 & (\mathrm{A})_{16} = (1010)_2 & (7)_{16} = (0111)_2. \end{array}\]

This means that you can write either \[(4\mathrm{C}\mathrm{A}7)_{16} = (0100110010100111)_{2}\] or, if the leading zero is not needed, \[(4\mathrm{C}\mathrm{A}7)_{16} = (100110010100111)_{2}.\] Why wouldn’t you always delete leading zeroes? Notice that a bitstring of length 4 can be used to encode a sequence of "Yes/No" or "True/False" answers. As an example, since $(6)_{16} = (0110)_2,$ the hexadecimal digit $(6)_{16}$ can be used to encode the sequence of 4 answers "No, Yes, Yes, No" to a Yes/No survey, and in this context the leftmost bit should be kept to make it clear that the answer to the first question was "No" (as opposed to the sequence "Yes, Yes, No, blank" where the fourth question was not answered.)

To convert a numeral from binary to hexadecimal, first break up the binary notation into blocks of 4 bits, adding a suitable number of leading zeros if necessary. Next, convert each block of 4 bits to a hexadecimal digit and concatenate the results, removing any leading zeros if necessary.

Example 9 - Binary to Hexadecimal Conversion

Find the hexadecimal expansion of $(110 1011 1111)_2.$

Solution

The are three blocks of 4 bits: \[0110,\ 1011,\ 1111.\] Since $(0110)_2 = (6)_{16},$ $(1011)_2 = (\mathrm{B})_{16},$ and $(1111)_2 = (\mathrm{F})_{16},$ \[(11010111111)_{2} = (6\mathrm{B}\mathrm{F})_{16}.\]

A similar method can be used to convert between octal and binary. Since $2^3 = 8,$ each octal digit can be written uniquely as a block of 3 bits as follows:

\[\begin{array}{llll} (0)_{8} = (000)_2 & (1)_{8} = (001)_{2}& (2)_{8} = (010)_2 & (3)_{8} = (011)_2 \\ (4)_{8} = (100)_2& (5)_{8} = (101)_2& (6)_{8} = (110)_2 & (7)_{8} = (111)_2. \end{array}\]

We then concatenate blocks, removing any leading zeros if necessary.

Also, the following table can be used to covert quickly between decimal, hexadecimal, octal, and binary in a similar way.

Conversion table for different bases

Decimal	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Hexadecimal	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
Octal	1	2	3	4	5	6	7	10	11	12	13	14	15	16	17
Binary	1	10	11	100	101	110	111	1000	1001	1010	1011	1100	1101	1110	1111

2.7. Exercises

Convert to decimal (base 10)
1. $(10262)_7$
2. $(30A8)_{16}$
3. $(1000010001100)_2$
4. $({12307)}_{60}$
Convert $\left(2039\right)_{10}$ from decimal (base 10) to
1. base 7
2. binary
3. hexadecimal (base 16)
4. octal (base 8)
Convert $\left(2599\right)_{10}$ from decimal to
1. base 5
2. binary
3. hexadecimal
4. base 3
Convert the following hexadecimal numerals to binary numerals
1. $\left(6F203\right)_{16}$
2. $\left(3FA20C45\right)_{16}$
3. $\left(FACE\right)_{16}$
Convert the following binary numerals to hexadecimal numerals
1. $\left(1111100111010101101\right)_2$
2. $\left(\ 10001111101011\right)_2$
3. $\left(1100101011111110\right)_2$

3. Counting: Arithmetic Techniques

This chapter was last updated on February 2, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

You probably learned how to count objects one by one when you were a child. You may have counted up to 5 on the fingers on one hand, or up to 10 on the fingers of two hands, or up to 20 on fingers and toes - that is, you paired the objects (fingers and/or toes) with number names to create a one-to-one correspondence. You may have counted all the way up to 100.

However, many problems in computer science or mathematics require you to count the number of elements in sets that contain millions of elements, or billions of elements, or even more elements, so counting one by one is inefficient or impossible. Starting with this chapter, and continuing later in the Set Theory and Counting: Permutations and Combinations chapters, you will study ways to quickly and efficiently count the number of elements of any set no matter the size of the set.

Key terms and concepts covered in this chapter:

Counting arguments
- Sum Rule
- Product Rule
- Division Rule (also called the Rule Of Quotient)
- Subtraction Rule (also called the Principle of Inclusion-Exclusion for two sets)
The pigeonhole principle

3.1. Some Foundational Counting Principles

Each of the four arithmetic operations corresponds to a rule we can use to count quickly.

3.1.1. The Sum Rule

The sum rule, also called the addition principle or the rule of sum, describes the number of possible choices of one element from a union of two sets that share no common elements (Such sets are called disjoint sets).

The Sum Rule

Suppose that we have a procedure that consists of completing one step, and that the step can be chosen from either a first set of j possible ways to complete the step or from a second set of k possible ways to complete the step, and that no way of completing the step is in both the first and second sets. Then there are $j+k$ ways to complete the procedure.

This is called the sum rule for counting because it involves adding to find a total. The sum rule can also be extended to more than two sets, as long as every pair of the sets have no elements in common.

Example 1

A student in a Capstone course can choose a project from three different professors. The professors have 3, 7, and 4 possible projects, and no project is on more than one professor’s list. How many possible projects can the student choose?

Solution

The student can pick out a project by choosing from the first professor, the second professor, or the third professor. Since no project is on more than one list, the sum rules says that there are $3 + 7 + 4 = 14$ projects to choose from.

You Try

A card will be drawn from a standard 52-card deck.
How many ways can the card be an even number or a king?
Image credit: This is a cropped version of "Set Of Playing Card". George Hodan has released this “Set Of Playing Card” image under Public Domain license (CC0 Public Domain).

3.1.2. The Subtraction Rule

The subtraction rule describes the number of possible choices of one element from a union of two sets that do share one or more common elements.

The Subtraction Rule

Suppose that we have a procedure that consists of completing one step, and that the step can be chosen from either a first set of j possible ways to complete the step or from a second set of k possible ways to complete the step, but that there are m possible ways of completing the step that are in both the first and second sets. Then there are $j+k-m$ ways to complete the procedure.

The subtraction rule is a special case of the inclusion-exclusion principle that involves only two sets. The inclusion-exclusion principle is discussed in more detail in the Set Theory and Proofs: Mathematical Induction chapters.

Example 2

In a group of students, 17 are enrolled in discrete mathematics, 13 are enrolled in probability, and 6 are enrolled in both discrete mathematics and probability.
How many students are in the group?

Solution

There are 17 students enrolled in discrete mathematics, but 6 of these students are enrolled in both discrete mathematics and probability. Likewise, there are 13 enrolled in probability, but 6 of these students are enrolled in both discrete mathematics and probability.
If we applied the addition rule by computing $17+13 = 30$, this counts the 6 students who are enrolled in both discrete mathematics and probability twice, once as a student enrolled in discrete mathematics and then again as a student enrolled in probability. We can repair the count by subtracting 6 from the sum of 17 and 13; this will count each of the students exactly once. That is, the number of students in the group is $17+13-6=24.$
Alternative solution
First form three sets that have no students in common: Subtract 6 from each of 17 and 13 to find that there are $17-6=11$ students in the set of students who are are enrolled in discrete mathematics but not probability, $13-6=7$ students in the set of students who are enrolled in probability but not discrete mathematics, and 6 students in the set of students who are enrolled in both discrete mathematics and probability. Notice that the Sum Rule can be used because no student can be in two (or more) of the sets: There are $(17-6)+(13-6)+6 = 24$ students, which is the correct count.
Also note that $(17-6)+(13-6)+6 = 17+(-6)+13+(-6)+6 = 17 + 13 -6 =24,$ which is the computation used in first solution.

Example 3

A tech company has 200 applicants for a position. Of the applicants, 150 were computer science majors, 43 were business majors, and 25 were double majors in both computer science and business.
How many applicants did not major in either computer science or business?

Solution

By the subtraction rule, there are $150 + 43 - 25 = 168$ applicants that majored in computer science or business (or both). The number of applicants who did not major in either computer science or business is $200 - 168 = 32.$

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Notice that Dr. Roberts uses a Venn diagram to represent the sets in this video. Venn diagrams are covered in the Set Theory chapter of the Remix.

3.1.3. The Product Rule

The product rule, also called the multiplication principle or the rule of product, describes the number of possible choices of two successive elements where the first element comes from one set and the second from another set (which could be the same set as the first set).

To find the total number of outcomes for two or more successive events where both events must occur, multiply the number of outcomes for each event together. For instance, if you want to find the number of outcomes possible when you roll a die and toss a coin, you could use the product rule. It is important to note that the events must be independent, meaning one doesn’t effect the other.

The Product Rule

Suppose that we have a procedure that consists of completing two steps, that there are k possible ways to complete the first step, and for each possible way of completing the first step, there are m possible ways to complete the second step. Then there are k⋅m ways to complete the procedure.

Example 4

Suppose there are 27 computers in a computer center and each computer has 15 ports. How many different ways are there to choose a specific port?

Solution

Choosing a port means you first choose a computer and then a port on that computer. Since there are 27 computers and 15 ways to choose a port on a computer, there are $(27)(15) = 405$ ways to choose a port.

You Try

How many functions are there from a set $A$ with $m$ elements to a set $B$ with $n$ elements?
Click on "Hint" to reveal the hidden text.

Hint

Try working out an example where $m$ and $n$ are small natural numbers. For example, how many functions are there from the set $\{ 1, 2, 3, 4 \}$ to the set $\{ \text{"red"},\,\text{"green"},\,\text{"blue"} \}$?

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Example 5

Example 6

You can use more than one of the rules to solve a problem.

How many bitstrings of length four start with 1 or end with 00?

Solution

First, a bitstring of length four that starts with 1 will be of the form $1~*~*~*$, where there are two choices for each $*$, either 0 or 1. Use the product rule to compute that there are $(1)(2)(2)(2) = 2^3 = 8$ bitstrings of this form.

Secondly, a bitstring of length four that ends with 00 will be of the form $*~*~0~0$, so there are $(2)(2)(1)(1) = 2^2 = 4$ bitstrings of this form.

Thirdly, a bitstring of length four that starts with 1 and ends with 00 will be of the form $1~*~0~0$, so there are $(1)(2)(1)(1) = 2$ bitstrings of this form.

Now use the subtraction rule to compute the number of bitstrings of length four that start with 1 or end with 00 (or both): $8 + 4 - 2 =10.$

You Try

If a card is drawn from a standard 52-card deck, how many ways can the card be black or a face card (that is, either a Jack or a Queen or a King)?

3.1.4. The Division Rule

This rule is used when there are $n$ ways to complete a procedure, but each of those ways is equivalent to $d$ ways (including the way itself.) That is, every possible outcome of the procedure can be done in $d$ different ways

The Division Rule

Suppose that we have a procedure that can be completed in n possible ways, but that for each way of completing the procedure there are d possible ways with the same outcome. Then there are $\frac{n}{d}$ ways to complete the procedure.

The next example uses both the product and division rules.

Example 7

Four students will sit around a circular table, but two seatings are considered "not different" whenever each student has the same left neighbor and right neighbor. How many different seatings are there?

Solution

First use the product rule to find that there are (4)(3)(2)(1) = 24 possible ways for the 4 students to sit (For example, there are 4 choices for the "North," then 3 choices remaining for the "East," then 2 choices remaining for the "South," and 1 choice remaining for the "West.") Next, notice that if all the students shifted one chair to the left once, twice, or thrice, they would all have the same neighbors that they originally had. This means that for each of the 24 ways the students can sit, 4 of those ways are not considered different.
Therefore, there are $\frac{24}{4}$ or 6 different seatings.

3.2. The Pigeonhole Principle

A suprising number of counting problems can be solved with the so-called pigeonhole principle.

Pigeonhole Principle

If $k+1$ pigeons are roosting in $k$ pigeonholes then at least one pigeonhole must contain more than one pigeon.

NOTE: The Pigeonhole Principle is often attributed to Peter Gustav Lejeune Dirichlet, who called it the Schubfachprinzip. The remixer is willing to speculate that this principle has been known for at least as long as humans have kept birds such as pigeons.

Click here to see a photoshopped image that illustrates this principle.

Example

In a group of 367 people at least two will have the same birthday because there are only 366 possible birthdays (counting February 29).

You Try

How many people, with English names, must be in a room for at least two of the people to have first names that starts with the same letter?

3.3. Exercises

There are 67 mathematics majors and 124 computer science majors at a college. There is no student who is both a mathematics major and a computer science major.
1. In how many ways can two representatives be picked so that one is a mathematics major and one is a computer science major?
2. In how many ways can one representative be picked who is either a mathematics major or a computer science major?
A multiple-choice test contains 20 questions, and each question has four choices.
1. In how many ways can a student answer all of the questions on the test if each question must be answered?
2. In how many ways can a student answer all of the questions if the student is allowed to not answer one or more questions?
How many different three-letter initials, using uppercase English letters, are there?
How many different three-letter initials, using uppercase English letters, end with "R"?
How many bit strings are there of length five?
How many bit strings are there of length five that begin and end with 1?
How many bit strings are there of length less than $n$, where $n$ is a positive integer, that start and end with 1?
How many license plates can be made using three digits followed by four uppercase English letters if:
1. Digits and letters can be repeated?
2. Digits and letters cannot be repeated?
Each student in a Discrete Mathematics class is a mathematics major, a computer science major, or a double major in both mathematics and computer science. If the class has 5 mathematics majors (including double majors), 23 computer science majors (including double majors), and 7 double majors, how many students are in the class?
Suppose a computer system requires a password of length no less than 7 and no more than 10 characters, and each character must be an English lowercase letter, an English uppercase letter, a digit, or one of six special characters (*, >, <, !, +, =).
1. How many different passwords are available?
2. Suppose a hacker can check a potential password once every nanosecond (1 nanosecond is $1 \times 10^{-9}$ seconds). How long will it take the hacker to check every potential password?
Suppose that there are 29 students in a class, all of whose last names use only English letters. Explain why at least 2 students in the class have last names that begin with the same letter.
Show that in any set of 5 integers, there are at least two of them that have the same remainder when divided by 4.
A bag contains 8 red balls and 7 blue balls.
1. How many balls must be chosen to be sure of choosing 3 of the same color?
2. How many must be chosen to be sure of choosing 3 red balls?
Someone cleaning out their attic finds a box containing 12 rock CDs and 12 country CDs. What is the minimum number of CDs they can take out to guarantee at least one of each type?
Give an argument that there are at least two people in California with the same number of hairs on their head.

4. Set Theory

This chapter was last updated on September 15, 2025.
Fixed some typos.

Set theory, along with logic, is the foundation of mathematics in our time. In earlier eras, people tried to use arithmetic (e.g., counting and whole numbers) and geometry (e.g., measurement of lengths, areas, and volumes along with geometric constructions) as the foundations of all mathematics. However, over the last two centuries, mathematicians gained new understandings of issues with these traditional foundations which led them to seek a new, firmer foundation. For now, that firmer foundation uses set theory: The study of collections of objects and how those collections can be combined, associated, and themselves added to other collections.
NOTE: If you want to dig much more deeply into what the issues with using arithmetic or geometry as the foundation of mathematics are, you can start with this Wikipedia page which includes a a brief description of the "foundational crisis of mathematics."

Key terms and concepts covered in this chapter:

Sets
- Subsets of a set
- The empty set
- The power set of a set
- Cartesian products
Venn diagrams
Cardinality and countability of finite and infinite sets
- Set cardinality and counting
Operations with sets: Union, intersection, complement, and others
- DeMorgan’s laws
- Inclusion-exclusion principle

4.1. Sets

A set is an unordered collection of objects, called elements or members. A set is said to contain its elements.

If $x$ is an element of the set $S,$ then we write $x \in S$. If $x$ is not an element of the set $S$, then we write $x \not\in S$. For example, if $S$ is the set of names of states in the United States of America, then “New York” is an element of $S$ and “Ontario” is not an element of $S,$ that is \[ \text{“New York”} \in S \text{ and } \text{“Ontario”} \not\in S. \] As another example, if $E$ is the set of even integers, then $2 \in E$ and $3 \not\in E.$

4.1.1. Describing A Set: The Roster Method

One way of describing a set is the roster method: List all the elements of the set between curly braces. For example, \[A = \{1,-2,0,1,-3\} \] is the set whose elements are $-3,$ $-2,$ $0,$ and $1.$

Notice that the set $A$ contains exactly $4$ elements, even though the element $1$ appears twice in the roster - duplicate entries do not matter.
Also, the order of the elements in the list does not matter. That is, $\{-3,-2,0,1 \}$ and $\{0, 1,-2,-3\}$ are two more ways of describing the same set $A.$

Example 1 - Checking Set Membership in Python

The code below checks to see if $5$ and $0$ are elements of the set $A = \{1,-2,0,1,-3\}.$ Since $5 \not\in A$ and $0 \in A,$ the code prints False followed by True.

Edit in PythonTutor

Example 2 - Listing the Elements of a Set in Python

The code below lists all of the elements of the set $A = \{1,-2,0,1,-3\}.$

Notice that 1 appears in the set once even though it appears in the roster twice.

Edit in PythonTutor

A WARNING ABOUT THE PYTHON EXAMPLES INVOLVING SETS: The mathematical set $A = \{1,-2,0,1,-3\}$ is a constant - you cannot change the set by removing elements or inserting new elements. However, Python objects of type set are mutable so it is possible to remove elements or insert new elements, as shown in the following code example. The mathematically correct implementation of sets in Python uses objects of type frozenset because objects of type frozenset are immutable, just like mathematical sets. But there are advantages to using type set instead of frozenset: The roster method notation can be used to initialize or print a Python set, but cannot be used with a Python frozenset: You must call the frozenset constructor to create and initialize a frozenset. The authors of the original “Discrete Math” chose to use Python sets instead of frozensets in the code examples; the author of this remix made the same choice.

WARNING: Python Sets Are Mutable, Mathematical Sets Are "Frozen"

The mathematical set $A$ does not allow removals or insertions, but the Python set $A$ does. The frozenset $F$ is a more faithful implementation of the mathematical set $A$, but notice that the symbolism for Python sets more closely matches the symbolism used for mathematical sets.

Edit in PythonTutor

4.1.2. Describing A Set: Set Builder Notation

Another way of describing a set is the use of set builder notation. We write a set as \[\{x \in D : P(x)\}.\] This is the set of all elements $x$ from a domain $D$ that satisfy the predicate $P(x).$ We can use either the colon $:$ or the vertical bar $|$ as the separator in this notation. For example, $\{ x \in \mathbb{N} \, | \, x^{2} \leq 50 \}$ is the set of natural numbers that are less than $\sqrt{50}.$

Yet another way of describing a set is to use a function or an algebraic expression, as in \[\{ f(x) : x \in D \}.\]This is the set of all values $f(x)$ for $x$ in the domain $D$. For example, $\{ 2n : n \in \mathbb{N} \}$ is the set of the even natural numbers. Again, we can use either the colon $:$ or the vertical bar $|$ as the separator.

Example 3 - Set Builder Notation in Python

The set $\{x \in D: P(x)\}$ can be expressed in Python as {for x in D if P(x)}. For example, the code below defines the set $B$ as the set of positive elements of the set $A = \{1,-2,0,1,-3\}.$

Edit in PythonTutor

4.1.3. Describing A Set: Special Sets Of Numbers

You may already be familiar with the following sets of numbers, which are listed here for reference.

Special sets of numbers

$\mathbb{N} = \{0, 1, 2, 3,...\}$, the set of natural numbers
$\mathbb{Z} = \{...,-2, -1, 0, 1, 2,...\}$ , the set of integers
$\mathbb{Z}^+ = \{1, 2, 3,...\}$, the set of positive integers
$\mathbb{Q} = \left\{\left.\frac{a}{b}\right|a\in \mathbb{Z},b\in \mathbb{Z},b\neq 0\right\}$, the set of rational numbers
$\mathbb{Q}^+$, the set of positive rational numbers
$\mathbb{R}$, the set of real numbers
$\mathbb{R}^+$, the set of positive real numbers
$\mathbb{C} = \{a+bi : a\in \mathbb{R},b\in \mathbb{R},b\neq 0,i^{2}=-1\}$, the set of complex numbers.

Other special sets will be defined as needed.

4.1.4. Describing A Set: Switching Between Representations

A set can usually be described in more than one way, as shown in the following example.

Example 4 - Switching between representations

Consider the following set: \[\{x \in \mathbb{Z} : -2 \leq x < 4\}.\] This is the set of all integers $x$ such that $-2$ is less than or equal $x$ and $x$ is less than 4. Using the roster method, this set can be written as \[\{-2,-1,0,1,2,3\}.\]

You Try

Match each set described using set builder notation in parts (a) through (f) with the same set described using the roster method in parts (A) through (F).

$\{x \in \mathbb{Z} : x^2 = 1\}$
$\{x \in \mathbb{Z} : x^3 = 1\}$
$\{x \in \mathbb{Z} : |x| \leq 2\}$
$\{x \in \mathbb{Z} : x^2 < 4\}$
$\{x \in \mathbb{Z} : x < |x|\}$
$\{x \in \mathbb{Z} : (x + 1)^2 = x^2 + 2x + 1\}$

$\{-1,0,1\}$
$\{\dots, -3,-2,-1,0,1,2,3,\dots\}$
$\{1\}$
$\{\dots, -3,-2,-1\}$
$\{-1,1\}$
$\{-2,-1,0,1,2\}$

When there are too many elements in a set for us to be able to list each one, we often use ellipses ($\dots$) when the pattern is obvious. For example, we have \[\mathbb{Z} = \{\dots,-3,-2,-1,0,1,2,3,\dots\}.\]

4.2. Equality Of Sets

We say that two sets are equal if and only if they contain the same elements. When $A$ and $B$ are equal sets, we write $A = B$. When $A$ and $B$ are not equal sets, we write $A \neq B$.

The three sets $\{2,3,5,7\},$ $\{5,2,7,3\},$ and $\{x \in \mathbb{N} : x \text{ is prime and } x < 10 \}$ are equal sets because they contain the same elements. In fact, $\{2,3,5,7\},$ $\{5,2,7,3\},$ and $\{x \in \mathbb{N} : x \text{ is prime and } x < 10 \}$ are really just three different descriptions of the same set, in the same way that $1 + 3,$ $5 - 1,$ and $2^{2}$ are three different descriptions of the same number, 4. The extended equality \[\{2,3,5,7\} = \{5,2,7,3\} = \{x \in \mathbb{N} : x \text{ is prime and } x < 10 \}\] is a true statement for the same reason the extended equality $1 + 3 = 5 - 1 = 2^{2}$ is a true statement.
Note: You may be used to using the equal sign "=" as if it means "simplifies to" in your previous math experience, but "=" actually means "represents the same thing as."

4.3. The Empty Set

Consider the set of all natural numbers whose square is equal to 2, described using set builder notation: $\{x \in \mathbb{N} : x^2 = 2\}.$ If you use the roster method to list all the elements you will get the set $\{ \}$ because there are no natural numbers whose square is equal to 2!

The set $\{ \}$ is called the empty set, or the null set. The symbol $\emptyset$ is used to represent the empty set, too, that is, \[ \emptyset = \{ \}. \]

Example 5 - Listing the Elements of a Nonempty Set

To define the empty set in Python, we must call the constructor set(). Python interprets the empty curly braces {} as an empty object of type dict, called a dictionary, that is used to represent mappings of key:value pairs.

The function in the code below checks to see if a set is empty. If the set is nonempty, its elements are listed.

Edit in PythonTutor

It is important to note that $\{\}$ and $\emptyset$ are both ways to write the empty set. However, the mathematical set $\{ \emptyset \}$ is not the empty set because it contains one element, namely the empty set. In general, the set $A$ is not the same as the set $\{ A \}.$

Python Tip: The mathematical set $\{ \emptyset \}$ must be implemented as $\{ \text{frozenset()} \}$, which is the Python set that contains the empty frozenset. In general, anytime we want to implement a mathematical set $B$ as an element of another mathematical set $A$ in Python, we need to implement $B$ as a frozenset in order to be used as an element of the Python set $A$. This is due to the fact that elements of Python sets must be hashable; further explanation is beyond the scope of this textbook.

4.4. Subsets of a Set

Suppose $A$ and $B$ are two sets, and that every element $x$ of the set $A$ is also an element of set $B.$ We say that $A$ is a subset of a set $B,$ and write $A \subseteq B$. If a set $C$ is not a subset of $B,$ we write $C \not\subseteq B$.

If $A \subseteq B$ but $B$ contains at least one element that is not in A, then $A$ is called a proper subset of $B$, denoted $A \subset B$. That is, $A$ is a proper subset of $B$ if it is a subset of $B$ but is not equal to $B.$

Example 6

Suppose that we have three sets $R = \{1,5\},$ $S = \{1,3,5\},$ and $T = \{1,4,7\}.$

$R \subseteq S,$ since each element $x$ of $R$ also is an element of $S.$
$R \subset S$ since $3$ is an element of $S$ but is not an element of $R.$
$S \not\subseteq T$ since $3$ is an element of $S$ but is not an element of $T.$ Likewise, $T \not\subseteq S$ since $4$ is an element of $t$ but is not an element of $S.$

Theorem

For any set $A$ \[\emptyset \subseteq A\] \[A \subseteq A\]

For any sets $A$ and $B$ \[ A = B \text{ if and only if both } A \subseteq B \text{ and } B \subseteq A\]

Example 7 - Subsets in Python

In Python, we can check whether a set $A$ is a subset of a set $B$ in one of the following ways:

A.issubset(B)
A <= B.

Edit in PythonTutor

4.5. The Power Set of a Set

Given a set $A,$ we can define a new set by collecting together all subsets of $A$. This new set is called the power set of $A.$ The power set of $A$ is denoted by $\mathcal{P}(A).$ That is, \[ \mathcal{P}(A) = \{ B \, | \, B \subseteq A \}. \] Notice that $\mathcal{P}(A)$ is a set whose elements are themselves sets.

Example 8 - The Power Set Of A Set

Suppose that $A = \{0,1,2\},$ then \[\mathcal{P}(A) = \{\emptyset, \{ 0 \}, \{ 1 \}, \{ 2 \}, \{0,1\}, \{0,2\}, \{1,2\}, \{0,1,2\}\}.\] Notice that the empty set is an element of $\mathcal{P}(A)$ along with all the other subsets of $A.$

The empty set has only one subset, namely itself. Thus, we see that \[\mathcal{P}(\emptyset) = \{\emptyset\}.\]

We can also find the power set of a power set. For example, we have the following:

\[\begin{split} \mathcal{P}(\{ 3 \}) &= \{\emptyset, \{ 3 \}\},\\ \\ \mathcal{P}(\mathcal{P}(\{ 3 \}) &= \mathcal{P}(\{\emptyset, \{ 3 \})\\ &= \{\emptyset, \{\emptyset\}, \{ \{ 3 \} \}, \{\emptyset, \{ 3 \}\}\}. \end{split}\]

4.6. Cartesian Products

The Cartesian product of two sets $A$ and $B$ is the set of ordered pairs defined by,

$ A\times B=\{(a,b) \, | \, a\in A \text{ and } b\in B)\}$,

Example 9

Consider the sets, $B=\{0,1\}$, $T=\{0,1,2\}$, and, $C=\{a,\ b,\ c, d\}$. Determine how many elements are in each set using the product rule, and verify by writing out each set using the roster method.

$B\ \times\ C$
$C\times B$
$B\ \times\ T$
$B\ \times\ B$
$B\ \times\ B\ \times B$

Solution

For the set, $ B\ \times C $, notice that this will be all ordered pairs of the form, $(a,b)$, with $a \in B$, and $b \in C$, giving,

$B\ \times\ C=\{(0,a), (0,b), (0,c), (0,d),(1,a), (1,b), (1,c), (1,d))\}$, which has $2 × 4=8$, elements.

For $C\ \times\ B$, switch the ordering, for $B\ \times\ C$, to obtain the set with $8$, elements,

$C\ \times B=\{(a,0), (b,0), (c,0),(d,0),(a,1), (b,1), (c,1), (d,1)\}$,

The set $B \times T$, will be all ordered pairs of the form, $(a,b)$, with $a \in B$, and $b \in T$, giving, the set with $2 × 3=6$, elements,

$B \times T=\{(0,0),(0,1),(0,2),(1,0),(1,1),(1,2)\}$,

The set $B \times B$, will be all order pairs of the form, $(a,b)$, with $a, b \in B$, giving the set with $2 × 2=4$, elements,

$B \times T=\{(0,0),(0,1),(1,0),(1,1)\}$,

Finally the set $B \times B \times B$, will be the set of all ordered triples of the form, $(a,b,c)$, with $a, b, c \in B$, giving the set with $2 × 2 × 2=8$, elements,

$B \times B \times B=\{(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1)\}$,

Cartesian products are created using ordered pairs, so if $A$ and $B$ are different sets, then $A \times B$ is different from $B \times A$.

The Cartesian coordinate systems are Cartesian products.

The two-dimensional $xy$-plane is represented by $\mathbb{R}^2=\mathbb{R}\times \mathbb{R}=\{(x,y)|x,y\in \mathbb{R}\}$, and, the three-dimensional $xyz$-space are represented by $\mathbb{R}^3=\mathbb{R}\times \mathbb{R}\times \mathbb{R}=\{(x,y,z)|x,y,z\in \mathbb{R}\}$

4.7. Cardinality Of Sets: Finite Sets

Cardinality is the formalization of the idea of the count of the number of elements in a set.
In this section, we will prefer counting from 1 instead of 0. You will see below why this makes no difference.

Set $A$ is called a finite set if either

$A$ is the empty set or
there is a one-to-one correspondence between $A$ and the set $\{ i \in \mathbb{N} \, | \, 0 < i \leq n \} = \{1, 2, \ldots , n \}$ for some positive integer $n.$

This definition of "finite set" may seem abstract, but it’s just a formal description of what is likely the way you learned to count when you were young: You matched objects with number names (that is, numerals) as shown in the image.

The cardinality of a finite set $A,$ denoted by $|A|,$ is

0 if $A = \emptyset$ or
the value of $n$ for which there is a one-to-one correspondence between $A$ and $\{1, 2, \ldots , n \}.$

For a finite set $A$ the cardinality $|A|$ is just the number of elements in the set. The image shows that $|\{0,1,2,3,4\}| = 5.$

Example 10 - Cardinality of Finite Sets in Python

The cardinality of a finite set $A$ can be computed in Python as follows:

len(A)

Edit in PythonTutor

Example 11

Suppose that $A$ and $B$ are finite sets.

The cardinality of the Cartesian product $A × B$ is $|A × B|=|A| \cdot |B|$.
The cardinality of the power set of $A$ is $\left|\mathcal{P}(A)\right| = 2^{|A|}.$

Challenge

Give informal arguments to justify each of the two bulleted statements.

Hint

Your arguments can use some of the rules introduced in Counting: Arithmetic Techniques chapter

Answer

For the Cartesian product, the elements of $A × B$ are ordered pairs of the form $(x,y)$ where $x\in A$ and $y\in B.$ There are $|A|$ choices for the first coordinate, and for each of those choices there are $|B|$ choices for the second coordinate. Use the product rule to conclude that there are $|A| \cdot |B|$ different ways to choose an ordered pair, so that $|A × B|=|A| \cdot |B|.$

For the power set of $A,$ to choose a subset of $A$ you must decide for each of the $|A|$ elements in $A$ whether to include that element in the subset. There are 2 choices for each element, so using the product rule repeatedly you can conclude that there are $(2)(2)\cdots(2)$ ways to choose a subset, where the number of factors of $2$ is $|A|$. So $\left|\mathcal{P}(A)\right| = 2^{|A|}.$ You can also think of this as counting all possible bitstrings of length $n,$ where a 1 bit means "include the corresponding element" and a 0 bit means "omit the corresponding element."

4.8. Venn Diagrams

A Venn diagram, named after the English mathematician John Venn, consists of one or more circles, with each circular region representing a set. An example can be seen here.

We write the elements of a set within the circular region that represents the set; anything written outside the circular region is not an element of the set. If an element is written in the overlap of two or more regions, then it is an element of each of the sets.

The circles are often drawn inside a larger rectangle which represents a universal set $U$ that we are focusing on. In the example linked above, the rectangle was omitted because every glyph was an element of at least one of the sets represented by a circular region, but if we introduced addition glyphs like ہ we would need to draw the rectangle because that glyph would need to be written outside all three circular regions.

In this textbook, a Venn diagram must show all the possible overlaps of the sets. This is consistent with Venn’s paper from 1880.
That is, you should NOT be able to answer the question "Is x an element of set A?" when x is written in the circular region for a different set, B. In the image, the upper right example shows a Venn diagram because you could write x in the overlap of the two regions or you could write x in the the part of the region for B that is outside the circular region for A. The lower two diagrams are not Venn diagrams: In either one of those, if x is written in the region for set B, it must be true that x is not an element of A (on the lower left) or that x is an element of A (the example on the lower right). Diagrams like the lower two examples will be called Euler diagrams in this textbook.
Some sources use the term Venn diagram for all four of the examples shown in the image, but you should always assume when reading this textbook that the lower two are NOT Venn diagrams. Click here to see the light!

4.9. Set Operations

We can obtain new sets by performing operations on other sets. When performing set operations, it is often helpful to consider all of our sets as subsets of a universal set $U.$ We can think of the universal set as the set of all of the objects under consideration.

We can represent set operations visually using Venn diagrams.

4.9.1. Union

The union of the sets $A$ and $B$ is the set containing those elements that are in $A$ or $B$ or both, and is denoted by $A \cup B$. More formally, \[A \cup B = \{x \in U : x \in A \text{ or } x \in B\}.\]

Note that "or" is read here as the "inclusive or". We have the following Venn Diagram for $A \cup B$:

Note that, for any sets $A$ and $B,$ \[A \cup B = B \cup A.\]

Example 12

If we let $A = \{1,2,3,4,5,6\}$ and $B = \{1,3,5,7,9\},$ then \[A \cup B = \{1,2,3,4,5,6,7,9\}.\]

Example 13 - Union in Python

In Python, we can compute the union of sets $A$ and $B$ in one of the following ways:

A.union(B)
A | B

Edit in PythonTutor

4.9.2. Intersection

The intersection of the sets $A$ and $B$ is the set containing those elements that are in $A$ and $B$ and is denoted by $A \cap B$. More formally, \[A \cap B = \{x \in U : x \in A \text{ and } x \in B\}.\]

We have the following Venn Diagram for $A \cap B$:

Note that, for any sets $A$ and $B,$ \[A \cap B = B \cap A.\] If it is the case that $A \cap B = \emptyset,$ then we say that $A$ and $B$ are disjoint. In other words, two sets are disjoint if and only if they contain no elements in common.

Example 14

If we let $A = \{1,2,3,4,5,6\}$ and $B = \{1,3,5,7,9\},$ then \[A \cap B = \{1,3,5\}.\]

Example 15 - Intersection in Python

In Python, we can compute the intersection of sets $A$ and $B$ in one of the following ways:

A.intersection(B)
A & B

Edit in PythonTutor

4.9.3. Complement

The complement of a set $A$ is the set of all elements in the universal set $U$ which are not elements of $A$ and is denoted by $\overline{A}.$ More formally, \[\overline{A} = \{x \in U: x \not\in A\}.\] Note that other textbooks and internet sources may use different notation for the complement of $A$, such as $A'$ and $A^{c}$, but these all stand for the same set, so that $\overline{A} = A' = A^{c}$.

We have the following Venn Diagram for $\overline{A}$:

For any set $A,$ \[ \overline{\overline{A}} = A \] \[ \overline{A} \cup A = U \] \[ \overline{A} \cap A = \emptyset. \]

Example 16

Suppose that our universal set is $U = \{0,1,2,3,4,5,6,7,8,9\},$ the set of all decimal digits. If we let $A = \{1,2,3,4,5,6\}$ and $B = \{1,3,5,7,9\},$ then \[\overline{A} = \{0,7,8,9\}\] and \[\overline{B} = \{0,2,4,6,8\}.\]

Example 17

Suppose that our universal set is $\mathbb{Z}.$ If we let $E$ be the set of all even integers, then $\overline{E}$ is the set of all odd integers.

4.9.4. Other Operations

The three operators complement, intersection, and union are the most commonly used to define subsets of a universal set. You will see why this is so later in the chapter.

However, there are some other operators you should be familiar with.

Difference

The difference of the sets $A$ and $B$ is the set containing those elements that are in $A$ but not in $B$ and is denoted by $A \setminus B$. Set difference is also denoted by $A - B$. More formally, \[A \setminus B = \{x \in U: x \in A \text{ and } x \not\in B\}.\]

We have the following Venn Diagram for $A \setminus B$:

Note that, for any sets $A$ and $B$, if $A \neq B,$ then \[A \setminus B \neq B \setminus A.\] However, if $A = B,$ then $A\setminus B = B \setminus A = \emptyset$.

Example 18

If we let $A = \{1,2,3,4,5,6\}$ and $B = \{1,3,5,7,9\},$ then \[A \setminus B = \{2,4,6\}\] and \[B \setminus A = \{7,9\}.\]

Example 19 - Difference in Python

In Python, we can compute the difference of sets $A$ and $B$ in one of the following ways:

A.difference(B)
A - B

Edit in PythonTutor

Symmetric Difference

The symmetric difference of the sets $A$ and $B$ is the set containing those elements that are in $A$ or $B$ but not both $A$ and $B$. It is denoted by $A \oplus B$ in this textbook, but other books and sources may use different notation such as $A \Delta B$. More formally, \[A \oplus B = \{x \in U: (x \in A \text{ and } x \not\in B) \text{ or } (x \in B \text{ and } x \not\in A)\}.\]

We have the following Venn Diagram for $A \oplus B$:

Note that, for any sets $A$ and $B,$ \[A \oplus B = B \oplus A.\]

Example 20

If we let $A = \{1,2,3,4,5,6\}$ and $B = \{1,3,5,7,9\},$ then \[A \oplus B = \{2,4,6,7,9\}.\]

Example 21 - Symmetric Difference in Python

In Python, we can compute the difference of sets $A$ and $B$ in one of the following ways:

A.symmetric_difference(B)
A ^ B

Edit in PythonTutor

4.9.5. Multiple Set Operations

We can also perform more than one set operation on a collection of sets. For example, let $A,$ $B,$ and $C$ be sets and consider the following set: \[(A \setminus B) \cup (C \setminus B).\]This is the set that is obtained by taking the union of the sets $A \setminus B$ and $C \setminus B.$ We have \[(A \setminus B) \cup (C \setminus B) = \{x \in U: (x \in A \text{ and } x \not\in B) \text{ or } (x \in C \text{ and } x \not\in B)\}.\]

We have the following Venn Diagram for $(A \setminus B) \cup (C \setminus B)$:

Note that the Venn Diagram also represents $(A \cup C ) \setminus B$. In general, there are multiple ways to describe the result of multiple set operations.

Video Examples

The following two video examples feature Dr. Katherine Pinzon, Professor of Mathematics at Georgia Gwinnett College.

Video Example 1

Video Example 2

You Try

Draw Venn Diagrams for each of these combinations of the sets $A$, $B$, and $C$.

$A \cap (B \cup C)$
$(A \cap B) \cup C$
$(\overline{A} \cap \overline{C}) \cup B$
$(B \cup C) \setminus A$

4.10. Set Identities

Here is a collection of additional properties of the operations on sets. Each of these can be verified by drawing two Venn diagrams, one that represents the left-hand side of the equation and another that represents the right-hand side of the equation and showing that the resulting shadings of the Venn diagrams are the same.

Note that it is traditional to focus on complement, union, and intersection as the three primary set operations because the other operations such as difference and symmetric difference can be written in terms of those three primary operations, for example, $A \setminus B = A \cap \overline{B}$ and $A \oplus B = (A \cap \overline{B}) \cup (\overline{A} \cap B)$.

Associative laws: \[ A ∪ (B ∪ C) = (A ∪ B) ∪ C \] \[ A ∩ (B ∩ C) = (A ∩ B) ∩ C \]

Distributive laws: \[ A ∪ (B ∩ C) = (A ∪ B) ∩(A ∪ C) \] \[ A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) \]

De Morgan’s laws: \[ \overline{A \cup B} = \overline{A} \cap \overline{B} \] \[ \overline{A \cap B} = \overline{A} \cup \overline{B} \]

4.10.1. Operator Precedence (Order Of Operations)

To ensure that we can properly interpret an expression involving multiple set operations, we can either use parentheses or rely on operator precedence.

When an expression for sets involves parentheses, complementation, intersection, and union, we start by evaluating all expressions enclosed in parentheses from left to right, then all complementations from left to right, then all intersections from left to right, and finally all unions from left to right. (Set difference and symmetric difference were left out of this discussion because there does not seem to be a standard definition for where they fit in! But, as shown earlier, those two operations can be rewritten in terms of complementation, union, and intersection.)

For example, the expression $\overline{A} \cup B \cap C$ represents the same set as $(\overline{A}) \cup (B \cap C)$. Parentheses must be used if you want to represent a different set such as $(\overline{A} \cup B) \cap C$.

This is the same way arithmetic expressions like $-3 + 5 \cdot 2$ are evaluated: The value of $-3 + 5 \cdot 2$ is $(-3) + (5 \cdot 2) = 7$, not $(-3 + 5) \cdot 2 = 4$.

4.11. Venn Diagrams, Partitions, and Bitstrings

A partition of a set $U$ is a set of subsets of $U$ such that each element $x \in U$ is a member of exactly one of the subsets in the partition.

As an example you already know, one partition of the set of integers $\mathbb{Z}$ is the set of subsets \[\{ \text{the set of even integers}, \text{the set of odd integers} \}\] Notice that every integer $n$ belongs to exactly one of the two elements of this set.

As another example, for any subset $A \subseteq U$ you have a partition of $U$ into the 2 sets that are elements of \[\{ A,\,\overline{A} \}\] Note that each element of $U$ must be in exactly one of the subsets $A$ and $\overline{A}$.

For two subsets $A$ and $B$ of a universal set $U$, consider the Venn diagram of $A$ and $B$. Notice that, by considering all possible intersections of these two sets and their complements, $U$ is partitioned into 4 subsets, namely, the 4 elements of \[\{ \overline{A} \cap \overline{B},\, \overline{A} \cap B,\,A \cap \overline{B},\,A \cap B \}\] We can refer to each of these 4 subsets by using bitstrings of length 2 as follows:

The leftmost bit is 1 if an element of the subset is an element of $A$, and is 0 if an element of the subset is not an element of $A$.
The rightmost bit is 1 if an element of the subset is an element of $B$, and is 0 if an element of the subset is not an element of $B$.

For example, in the following Venn diagram, the subset $A \cap \overline{B}$ is labeled with the bitstring $10$ because an element of $A \cap \overline{B}$ is an element of $A$ and not an element of $B$.

If you had instead three subsets $A$, $B$, and $C$ of the universal set $U,$ you could partition the universe $U$ into 8 subsets. In detail, if you have an element $x \in U$, either $x \in A$ or $x \not\in A$, and for each of those possibilities, either $x \in B$ or $x \not\in B$, and for each of those possibilities, either $x \in C$ or $x \not\in C$. We can apply (twice) the multiplication principle that was first mentioned in chapter 2 to show that there are $2 \cdot 2 \cdot 2$ possible subsets determined by the Venn diagrams of the 3 sets $A$, $B$, and $C$. Using bitstrings of length 3, we can label these 8 subsets as shown.

For an integer $n > 3$, the Venn diagram is less useful for representing the partitioning of the universe created by $n$ subsets, but we can still reason that there ought to be $2^{n}$ subsets in the partition, where each of the subsets can be described by a unique bitstring of length $n$ (We will be able give a formal mathematical proof of this for every positive integer $n$ later in the textbook after we’ve discussed mathematical induction.)

4.11.1. Disjunctive Normal Form (Set Version)

Suppose you have three sets $A$, $B$, and $C$, and have partitioned the universe $U$ into the 8 subsets as discussed above. A subset of $U$ that corresponds to any shading of the Venn diagram can be written as a union of intersections of three sets, with one set chosen from each of the pairs $\{ A,\,\overline{A} \}$, $\{ B,\,\overline{B} \}$, and $\{ C,\,\overline{C} \}$.

As an example, consider the set shown in the image, which has 4 of the 8 regions of the Venn diagram shaded:

$\overline{A} \cap \overline{B} \cap \overline{C}$ which is the region outside of all three sets,
$A \cap \overline{B} \cap \overline{C},$ the region in set $A$ but in neither $B$ nor $C,$
$\overline{A} \cap B \cap \overline{C},$ the region in set $B$ but in neither $A$ nor $C,$
$\overline{A} \cap B \cap C,$ the region in both $B$ and $C$ but not in $A.$

Write the union of these 4 subsets to create an expression that describes the entire shaded region. \[(\overline{A} \cap \overline{B} \cap \overline{C}) \cup (A \cap \overline{B} \cap \overline{C}) \cup (\overline{A} \cap B \cap \overline{C}) \cup (\overline{A} \cap B \cap C)\]

This type of expression is called a disjunctive normal form (or DNF) for the set that it represents. We will see an analog of these in a different context in the chapter on Logic.

The advantage of using the DNF is that you can write out an expression for the shaded subset using a simple algorithm. The DNF may be neither the shortest possible expression nor the most easily understood expression for the shaded part of the Venn diagram, but the DNF is a correct expression for the shaded subset.

4.12. The Principle Of Inclusion-Exclusion (PIE)

The correct relationship between $|A \cup B|$, $|A|$, and $|B|$ is given by \[ |A \cup B| = |A| + |B| - |A \cap B|. \]

Another way to see that this is the correct relationship is to use the partition $\{ \overline{A} \cap \overline{B},\, \overline{A} \cap B,\,A \cap \overline{B},\,A \cap B \}$ to write

$| A | = | A \cap \overline{B} | + | A \cap B |$,
$| B | = | \overline{A} \cap B | + | A \cap B |$, and
$| A \cup B | = | A \cap \overline{B} | + | A \cap B | + | \overline{A} \cap B |$, so

$| A | + | B | = | A \cap \overline{B} | + | A \cap B | + | \overline{A} \cap B | + | A \cap B | = | A \cup B | + | A \cap B |$.

Example 22

Consider the set $U = \{ n \in \mathbb{N} : 1 \leq n \leq 60 \}$. How many elements of $U$ are divisible by either 2 or 3 or both? How many elements of $U$ are divisible by neither 2 nor 3?

Let $A$ stand for the subset of $U$ that consists of multiples of 2, and let $B$ stand for the subset of $U$ that consists of multiples of 3. It’s not too difficult to see that $|A| = \frac{60}{2} = 30$ and $|B| = \frac{60}{3} = 20$. Also, $A \cap B$ must be the subset of $U$ that consists of multiples of 6, so $| A \cap B | = \frac{60}{6} = 10$. (If these computations don’t make sense to you, just start counting off the pattern $1,\,2,\,3,\,4,\,5,\,6,\,\ldots$ and notice that every 2nd number is divisible by 2, every 3rd number is divisible by 3, and every 6th number is divisible by both 2 and 3.) Now apply the Principle Of Inclusion-Exclusion to find the number of integers in $U$ that are divisible by either 2 or 3 or both: $|A \cup B| = |A| + |B| - |A \cap B| = 30 + 20 - 10 = 40$. There are 40 integers in $U$ that are divisible by either 2 or 3 or both, so there are $60 - 40 = 20$ integers in $U$ that are divisible by neither 2 nor 3.

If we want to compute the cardinality $|A \cup B \cup C|$ of the union of three given finite sets $A$, $B$, and $C$, we can again look at the Venn diagram of the partition of $U$ into 8 sets to see that some of the intersections will be counted one, two, or three times, once for each bit that is $1$.

We can derive the following formula in much that same way that we did above; in fact, we can just apply the formula we found for two sets to $| (A \cup B) \cup C |$ and use some of the set identities to help simplify the formula. \[ |A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|. \]

Show me all the steps!

\begin{equation} \begin{aligned} | A \cup B \cup C | {} & = | (A \cup B) \cup C | \\ & = |A \cup B| + |C| - |(A \cup B) \cap C| \\ & = (|A| + |B| - |A \cap B|) + |C| - |(A \cup B) \cap C| \\ & = (|A| + |B| - |A \cap B|) + |C| - |(A \cap C) \cup (B \cap C)| \\ & = (|A| + |B| - |A \cap B|) + |C| - (|A \cap C| + |B \cap C| - |(A \cap C) \cap (B \cap C)|) \\ & = (|A| + |B| - |A \cap B|) + |C| - (|A \cap C| + |B \cap C| - |A \cap C \cap B \cap C|) \\ & = (|A| + |B| - |A \cap B|) + |C| - (|A \cap C| + |B \cap C| - |A \cap B \cap C|) \\ & = (|A| + |B| - |A \cap B|) + |C| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \\ & = |A| + |B| - |A \cap B| + |C| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \\ & = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \\ \end{aligned} \end{equation}

Example 23

Consider the set $U = \{ n \in \mathbb{N} : 1 \leq n \leq 60 \}$ as in the previous example. How many elements of $U$ are divisible by at least one of 2, 3, or 5?? How many elements of $U$ are divisible by none of 2, 3, or 5?

As in the previous example, let $A$ stand for the subset of $U$ that consists of multiples of 2, let $B$ stand for the subset of $U$ that consists of multiples of 3, and now let $C$ stand for the subset of $U$ that consists of multiples of 5.

We have $|A| = \frac{60}{2} = 30$, $|B| = \frac{60}{3} = 20$, $|C| = \frac{60}{5} = 12$, $|A \cap B| = \frac{60}{6} = 10$, $|A \cap C| = \frac{60}{10} = 6$, $|B \cap C| = \frac{60}{15} = 4$, and $|A \cap B \cap C| = \frac{60}{30} = 2$.

Apply Principle Of Inclusion-Exclusion to find the number of integers in $U$ that are divisible by at least one of 2, 3, or 5:

$|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|$

$|A \cup B \cup C| = 30 + 20 + 12 - 10 - 6 - 4 + 1 = 44$

There are 44 integers in $U$ that are divisible by at least one of 2, 3, or 5, and there are $60 - 44 = 16$ integers in $U$ that are divisible by none of 2, 3, or 5.

4.13. Cardinality Of Sets: Infinite Sets

Set $A$ is called an infinite set if it not a finite set. That is, $A$ is not the empty set, and for every positive integer $n$ there is no one-to-one correspondence between $A$ and $\{1, 2, \ldots , n \}.$

Intuitively an infinite set $A$ is at least as big as the set of positive integers. You may think that $A$ must have the same size as the set of positive integers, but cardinality is a much more … "interesting" concept for infinite sets, as you will see.

First, we will say that two infinite sets $A$ and $B$ have the same cardinality if and only if there is a one-to-one correspondence between the two sets. As an example, the set of positive integers and the set of negative integers have the same cardinality since each nonzero integer $n$ can be paired with its additive inverse, $-n.$

For finite sets, if $A$ is a proper subset of $B$ then it must be true that the cardinality of $A$ is not the same as the cardinality of $B.$ This fails spectacularly for infinite sets as the next few examples show.

Example 24 - The Natural Numbers and The Positive Integers

The image shows that there is a one-to-one correspondence between the natural numbers and the positive integers, so these two sets have the same cardinality. But this isn’t so bad, we only have one more number in the set of natural numbers, which is why we can just shift every thing over by 1 in the image.

Notice that this example also suggests why it really does not matter whether we start counting from 1 or 0… we can always "reindex" the counting if necessary.

Challenge

Write a formula for the function that represents the one-to-one correspondence between the natural numbers and the positive integers.

Hint

The function is a linear function. If you don’t remember how to find the equation of a linear function, see this appendix.

Example 25 - The Natural Numbers and The Integers

The image shows that there is a one-to-one correspondence between the set of all integers and the natural numbers.

The image shows the inverse of the one-to-one correspondence above, which is a one-to-one correspondence between the set of natural numbers and the set of integers.

These two sets have the same cardinality, which may surprise you since "intuitively" it would seem there should be about twice as many integers as natural numbers. However, this example shows that you can double (roughly) the size of an infinite set and get a new set that has the same cardinality.

Challenge

Write a formula for a function that represents one of the two one-to-one correspondences involving the natural numbers and the integers.

Hint

Either function will be defined by two linear expressions. The definition will be of the form \[ f(n) = \begin{cases} \text{some linear expression}, & \text{ if } n \geq 0 \\ \text{some linear expression}, & \text{ if } n < 0 \\ \end{cases} \] or \[ g(n) = \begin{cases} \text{some linear expression}, & \text{ if } n \text{ is even} \\ \text{some linear expression}, & \text{ if } n \text{ is odd} \\ \end{cases} \]

Example 26 - The Natural Numbers and Ordered Pairs of Natural Numbers

This first image, which shows red points plotted in the $xy$-plane that have been labeled with natural numbers, suggests a way to define a one-to-one correspondence between the set of ordered pairs of natural numbers, $\mathbb{N} × \mathbb{N},$ and the set of natural numbers $\mathbb{N}.$
Image credit: "Cantor’s Pairing Function" by crh23. The image is dedicated to the public domain under CC0.

PairsToSingles This second image displays the same one-to-one correspondence in tabular form.

In the first image, notice that for each fixed value of the second coordinate $y \in \mathbb{N},$ the horizontal row of red points of the form $\{ (x, y) : x \in \mathbb{N} \} = \{ (0, y), (1, y), (2, y), (3, y), \ldots \}$ has the same cardinality as $\mathbb{N}$ and that there is one such row for every natural number $y \in \mathbb{N}.$ That is, the set of rows has the same cardinality as the set $\mathbb{N},$ and each of the rows has the same cardinality as $\mathbb{N}.$ There are, in essence, as many copies of $\mathbb{N}$ (the rows of red points) as there are elements in $\mathbb{N},$ and these copies are joined together to form the Cartesian product $\mathbb{N} × \mathbb{N}$… but this set still has the same cardinality as $\mathbb{N}.$

Notice something else about this example: It shows that each pair of natural numbers can be encoded as a single natural number. In fact, this example can be generalized to show that any element in the set of all finite-length sequences of natural numbers can be encoded uniquely to a natural number (so, for example, the set of all possible finite-length strings of Unicode characters/code points can be encoded to the set of natural numbers, which may or may not be surprising to you.)

Mega-challenge!

Try to find an algebraic formula for the function $n = f(x,y)$ that describes the one-to-one correspondence described by the images, then show that the function must map two different inputs (ordered pairs of natural numbers) to two different outputs (natural numbers) and also that every natural number is an output from the function for some input ordered pair of natural numbers.

Hint

It’s possible to write $f(x,y)$ as a quadratic polynomial in the two variables $x$ and $y.$

You may want to read about triangular numbers to get an idea of how the mapping of ordered pairs to numbers is being done. In the first image, the red points form a "triangle of infinite height" with a vertex at $(0,0)$ and sides lying along the $x-$ and $y-$axes. "Row 0" of the triangle is the single point $(0,0),$ and "row $n$" of the triangle is made up of the red points with natural number coordinates $(x,y)$ that add up to $n$ (that is, $x+y = n$.)

A proof that this mapping of ordered pairs of natural numbers to individual natural numbers is in fact a one-to-one correspondence will be presented later in the textbook.

So far, every infinite set presented has the same cardinality as $\mathbb{N}.$

Maybe all infinite sets have the same cardinality as $\mathbb{N}?$ Nope!

The next theorem shows that $\mathcal{P}(\mathbb{N})$ cannot have the same cardinality as $\mathbb{N}$ so there must be at least two "infinities."

Theorem

There is no one-to-one correspondence between $\mathbb{N}$ and $\mathcal{P}(\mathbb{N}).$

Proof

This proof uses a technique called "Cantor’s diagonal argument" and is an example of the proof by contradiction technique that will be discussed later in the Proofs: Basic Techniques chapter.

SeqOfSets

Let’s suppose that we had such a one-to-one correspondence. As shown in the image, we could represent the one-to-one correspondence by a sequence $S_0 , S_1 , S_2 , \ldots$ of subsets of $\mathbb{N},$ which is what the elements of $\mathcal{P}(\mathbb{N})$ are. In the one-to-one correspondence, every subset of $\mathbb{N}$ (that is, every element of $\mathcal{P}(\mathbb{N})$) appears as one of the $S_{n}$ in the sequence: Every subset has been paired with a natural number and every natural number has been paired with a subset.

Next, define a subset $M \subseteq \mathbb{N}$ as \[M = \{ n \in \mathbb{N} : n \not\in S_n \}\] That is, $M$ is defined by the rule that for each natural number $n,$ we have $n \in M$ if and only if $n \not\in S_{n}.$ So, for example, 0 is an element of $M$ if 0 is not an element of $S_0,$ but is not an element of $M$ if 0 is an element of $S_0.$ Likewise, 1 is an element of $M$ if 1 is not an element of $S_1,$ but is not an element of $M$ if 1 is an element of $S_1.$ The same is true for each of the natural numbers 2, 3, and so on: The natural number $n$ is an element of exactly one of the sets $S_n$ and $M.$

Now we show that $M$ must be missing from the sequence $S_0 , S_1 , S_2 , \ldots$
$M$ cannot be $S_0$ since one of those sets contains 0 and the other one does not. $M$ cannot be $S_1$ since one of those sets contains 1 and the other one does not. The same must be true for each of the natural numbers 2, 3, and so on. So, for every natural number $n,$ $n$ is an element of exactly one of the two sets $S_n$ and $M,$ which means $M \neq S_n$ is true for every natural number $n.$ This means that $M$ cannot be any of the sets in the sequence… it is missing!

We assumed that every subset is listed in the sequence, but just showed that there is some subset that is not listed in the sequence. Notice that, even if we tried to use a new sequence (for example, insert $M$ at position 0 and shift all the other subsets over by adding 1 to their subscripts) the diagonal argument could be used to define another subset that is missing from the new sequence. So, every possible sequence of subsets must be missing at least one subset.

Therefore, such a one-to-one correspondence cannot exist.

4.13.1. Countable and Uncountable Sets

Set $A$ is called countable if

$A$ is a finite set or
there is a one-to-one correspondence between $A$ and $\mathbb{N}.$ In this case, $A$ is also called countably infinite.

Set $A$ is called uncountable if it is not a countable set. That is, $A$ is infinite and there is no one-to-one correspondence between $A$ and $\mathbb{N}.$

Several examples of countably infinite sets were given in the examples in the preceding subsection:

The set of positive integers $\{ i \in \mathbb{N} \, | \, i > 0 \},$
the set of integers $\mathbb{Z},$ and
the set of ordered pairs of natural numbers, $\mathbb{N} × \mathbb{N}.$

On the other hand, the theorem in the preceding subsection shows that $\mathcal{P}(\mathbb{N})$ is an uncountable set.

Infinite Cardinal Numbers

In advanced mathematics, the concept of "infinite cardinal number" is developed and used to represent the sizes of infinite sets. Mathematicians use these infinite cardinal numbers to make sense of cardinalities like $|\mathbb{N}|$ and $|\mathcal{P}(\mathbb{N})|.$ It can be proven that \[|\mathbb{Q}| = |\mathbb{N}|\] \[|\mathbb{N}| < |\mathcal{P}(\mathbb{N})|\] \[|\mathcal{P}(\mathbb{N})| = |\mathbb{R}|\] and that \[\text{for any infinite set } A, |A| < |\mathcal{P}(A)|\] which shows that there must be infinitely-many infinite cardinal numbers.

4.14. Exercises

Remixer’s Note: This section is taken from the original “Discrete Math” book with only minor changes.

Consider as universal set, the set of all $26$, lowercase letters of the English alphabet, $U=\{a,b,c,…,v,w,x,y,z\}$, and the sets $A=\{a,b,c,d,e,f,g,h\}$, $B=\{f,g,h,i,j,k\}$, and $C=\{x,y,z\}$. For the sets given below:
1. List the sets below using roster form, and
2. Draw Venn Diagrams for each of the sets
  1. $A\cup B$
  2. $A\cap B$
  3. $A\cup C$
  4. $A\cap C$
  5. $A \setminus B$
  6. $B \setminus A$
  7. $A \setminus C$
  8. $C \setminus A$
  9. $A\cup C$
  10. $A\cap C$
  11. $\overline{A}$
  12. $\overline{B}$
  13. $\overline{C}$
  14. $\overline{B} \cap \overline{C}$
  15. $ (\overline{A} \cap \overline{B}) \cup (\overline{B} \cap \overline{C})$
Using Venn Diagrams, determine which of the following are equivalent
1. $A \setminus (A \setminus B)),$
  
  $A\cup B,$ and
  
  $A\cap B$
2. $A\cup \overline{A},$
  
  $A\cap \overline{A},$
  
  $U,$ and
  
  $\emptyset$
3. $\overline{A}\cap \overline{B}, $
  
  $\overline{A\cap B},$
  
  $\overline{A}\cup \overline{B},$ and
  
  $\overline{A\cup B}$
4. $A\cup (B\cap C),$
  
  $A\cap (B\cup C),$
  
  $(A\cap B)\cup (A\cap C),$ and
  
  $(A\cup B)\cap (A\cup C),$
5. $\overline{\overline{A}\cup(C \setminus B) }),$
  
  $A\cap (B \cup \overline{C}),$ and
  
  $A \setminus (C \setminus B)$
Write each of the following sets using set builder notation
1. $\{\ldots, -9, -7, -5, -3, -2, -1, 1, 3, 5, 7, 9, \ldots \}$
2. $\{\ldots, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10,\ \ldots \}$
3. $\{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 \}$
4. $\left\{ 1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\frac{1}{5},\ldots \right\}$
5. $\{0, 1, 4, 9, 16, 25, 36, 49, \ldots \}$
6. $\{\ldots,-10,-6, -2, 2, 6, 10, 14, 18, 22, \ldots \}$
7. $\{ 3, 9, 27, 81, 243,\ldots\}$
8. $\{ 1, 9, 25, 49, 81, \ldots \}$
Write each of the following sets in roster form
1. $\{x \in \mathbb{R} : |2x+5|=7\}$
2. $\{10n : n \in \mathbb{N}\}$
3. $\{10n : n \in \mathbb{Z}\}$
4. $\left\{2^n : n \in \mathbb{N}\right\}$
5. $\left\{2^n : n \in \mathbb{Z}\right\}$
6. $\left\{x \in \mathbb{R} : x^2=4\right\}$
7. $\left\{x \in \mathbb{R} : x^3=64\right\}$
8. $\left\{x \in \mathbb{Z} : x^2=5\right\}$
9. $\left\{x \in \mathbb{R} : x^2= -4\right\}$
10. $\left\{x \in \mathbb{Z} : |x-5|=3\right\}$
11. $\left\{3n+4 : n \in \mathbb{N}\right\}$
12. $\left\{3n+4 : n \in \mathbb{Z}\right\}$
13. $\left\{i^n : n \in\mathbb{N}\right\}$, where $i$ is such that $i^2=-1$ (the imaginary unit).
Consider the sets $A=\{1, 3, 5, 7, 9, 11, 13, 15, 17\}$, $B=\{2, 5, 7, 11\}$, and $C=\{1, 2, 3\}$,
1. Determine the cardinalities of following sets,
  1. $|A|$
  2. $|A\cup B|$
  3. $|A\cap C|$
  4. $|\mathcal{P}(A)|$
  5. $|\mathcal{P}(B)|$
  6. $|\mathcal{P}(C)|$
2. Give the following power sets,
  1. $\mathcal{P}(B)$
  2. $\mathcal{P}(C)$
Determine the cardinalities of following sets,
1. $\{n \in \mathbb{Z} : |n|\leq 10\}$
2. $\{A,B, \emptyset,\{2,5,6\}\}$
3. $\{\{A,B\},\{\},\{\{2,5,6\}\},\{\{2,5,6\},C\},\{A,B,C\}\}$
4. $\{\{\{A,B\},\emptyset,\{\{2,5,6\},C\},\{A,B,C\}\}\}$
Consider the sets, $B=\{0, 1\}$, $ S=\{spring, summer, fall, winter\}$, and $C=\{ a, b, c, d,e\}$. For each of the following sets:
1. Determine the following Cartesian products.
2. Calculate the cardinality of each Cartesian product.
  1. $B \times S$
  2. $S \times B$
  3. $B \times C$
  4. $C \times B$
  5. $B \times B \times B \times B$
  6. $S \times B \times B$
Determine the following power sets,
1. $\mathcal{P}(\{Alabama, Georgia, Florida, Louisiana\} )$
2. $\mathcal{P}(\emptyset )$
3. $\mathcal{P}(\{\emptyset\} )$
4. $\mathcal{P}(\{Alabama \} )$
5. $\mathcal{P}(\{Alabama, Georgia, Florida \} )$
6. $\mathcal{P}(\{\{Alabama, Georgia \}, \{Florida \} \} )$
Write the shaded regions in each of the following Venn diagrams using set notation.
Determine if each of the following are true or false. Explain your reasoning.
1. $\{7,4,6,2,11,3,5\}\subseteq \{1,2,3,4,5,6,7,8,9,10,11,12,13\}$
2. $\{1,2,3,4,5,6,7,8,9,10,11,12,13\}\subseteq \{7,4,6,2,11,3,5\}$
3. $\{7,4,6,2,11,3,5\}\subseteq \{7,4,6,2,11,3,5\}$
4. $\{3,8\}\nsubseteq \{7,4,6,2,11,3,5\}$
5. $ \{3n+4 : n \in \mathbb{N}\} \nsubseteq \mathbb{Z}$
6. $\mathbb{N}\subseteq \mathbb{Z}\subseteq \mathbb{Q}\subseteq \mathbb{R}$
7. $\{x \in \mathbb{R} : |x|<3\}\subseteq \{x \in \mathbb{R} \, | \, |x|<5\}$
8. $\{x \in \mathbb{R} : |x|>3\}\subseteq \{x \in \mathbb{R} \, | \, |x|>5\}$

5. Logic

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on September 23, 2025.
Edited section on logic circuits.

Logic is the study of reasoning. Logic is used to create, analyze, and validate arguments, where an argument is a finite sequence of statements that ends with a conclusion based on inferences made from earlier statements in the argument.

Among the applications of logic to computer science are design of electronic circuits and validation of algorithms and programs.

Key terms and concepts covered in this chapter:

Propositional logic (also called "propositional calculus")
- Logical operators (also called "logical connectives")
  - Negation ("not")
  - Conjunction ("and")
  - Disjunction ("or")
  - Conditional ("implication")
    
    The converse, inverse, and contrapositive of a conditional
  - Biconditional ("if and only if," abbreviated as "iff")
- Truth tables
- Well-formed formulas
- Satisfiability, tautology, and contradiction
- Normal forms (conjunctive and disjunctive)
Predicate logic
- Predicates as "statement-valued functions"
- Quantification of Predicates
  - Universal quantifier
  - Existential quantifier
To be added to this chapter after May 23, 2025:
- Limitations of propositional and predicate logic (e.g., expressiveness issues)
- Boolean algebra and Boolean circuits

5.1. Propositional Logic

A proposition is a statement that declares a fact that is either True or False (but not both!)

Propositions

Atlanta is the capital of California.
$1 + 1 = 2$
$1 + 1 = 3$

Not propositions

How much is this cookie?
Please sit down.
Wow!
This sentence is false.
$x + 1 = y$

Propositional logic consists of a set of formal rules for combining propositions in order to derive new propositions.

A goal of propositional logic is to have a method for creating valid arguments that are sequences of propositions, where the correctness and validity of the argument is based solely on the propositions' truth values (True and False), ignoring the actual content of the propositions. Compare this to doing algebra: You can write $2 (x + 3 y) = 2x + 6y$ because it is a correct and valid step to distribute multiplication over addition. You can do the algebra and ignore the specific numerical values (the "numerical content") that $x$ and $y$ stand for.

In propositional logic, it is traditional to use propositional variables such as p, q, and r to stand for the possible assignments of truth values to propositions; often, the propositional variables themselves are referred to as the propositions. Again, compare this to the algebraic example where you can treat $x$ and $y$ as numbers even though they are actually variables that stand for numbers.

What is the advantage of using symbols? A long time ago, philosophers discovered that is easier to follow lines of reasoning by putting our thoughts into symbols. This was an important step in the eventual development of modern technological society and, in particular, electronic computers. Before a computer can do its work, humans need to put our thoughts into them; however, a spoken language like English can be too difficult to use because many different phrases can represent the same logical statements.

Example 1 - Generalizing and abstracting an argument

Consider the following argument consisting of three propositions:

Sarah earned a B.S. in Computer Science.
Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.
Therefore, Sarah earned a C or better in Discrete Math.

This argument is valid: If you ASSUME that the first two propositions are True, then you can correctly conclude that the third proposition (which follows the introductory phrase "Therefore,") must be True as well.

Notice that you could change the name "Sarah" to "Daniel" without affecting the validity of the argument.

You can generalize the argument by changing "Sarah" to "the student." In fact, you can generalize much more by noticing that the form of the argument matches the following argument.

Individual X is a member of category A
Any individual that is a member of category A must also be a member of category B.
Therefore, individual X is a member of category B

You can make this argument completely abstract using propositional variables.

p is true.
The implication "if p then q" is true, too.
Therefore, q is true

You can build compound propositions (also called "propositional functions") from simpler propositions. For example, in the preceding example, the compound propostion "if p then q" was introduced as a new proposition built from the propositional variables p and q. In the next section, you will learn how to represent compound propositions using symbols.

5.2. Logical Operations and Truth Tables

In this section, compound propostions will be represented by using propositional variables and logical operator symbols (also called "logical operations" or "logical connectives") Once again, you can compare this to how numerical and algebraic relationships can be represented in symbols using algebraic expressions built with the usual arithmetic operator symbols with variables and numerals.

In Python, we use $\texttt{Boolean}$ variables to represent propositions and define functions for each compound proposition. Each compound proposition can implemented using the $\texttt{Boolean}$ operations $\texttt{not}$, $\texttt{and}$, and $\texttt{or}$ discussed in the section Operators and Expressions in the chapter "Appendix: An Introduction to Python".

A truth table can be used to display the truth values of a compound proposition that is built from propositional variables and logical operators. A truth table is created with rows representing all possible interpretations of the propositional variables, that is, all possible assignments of truth values to the propositional variables. Each column of a truth table displays the truth values for either one of the propositional variables or a compound proposition built up from propositional variables and/or simpler compound propositions. As an analogy, think of how a table can be used to display the numerical input and output values of a function represented by an algebraic expression.

The most commonly-used logical operators are described in the rest of this section.

5.2.1. Negation

"I am not an astronaut."

The negation of a proposition $p$, denoted in mathematics by $\neg p$ and read as "not $p$", is the proposition "It is not the case that $p$". The proposition $\neg p$ has the opposite truth value to $p.$
Other textbooks and sources may use $\overline{p}$ or $\sim \! p$ to represent $\neg p.$

\(p\)	\(\neg p\)
True	False
False	True

$p$

$\neg p$

True

False

True

For example, the negation of the proposition "Today is Friday." would be "It is not the case that today is Friday." or more succinctly "Today is not Friday."
For a proposition $p,$ exactly one of $p$ and $\neg p$ is True and exactly one is False. Two propositions can be contrary (that is, they could not both be True) without being negations of each other: As an example, both of the propositions "Today is Friday." and "Today is Saturday." could be False, so "Today is Saturday." is not the negation of "Today is Friday."

Notice that the two propositions $p$ and $\neg (\neg p)$ must always have the same truth value (You can see this by inserting a column for $\neg (\neg p)$ in the truth table shown earlier in this subsection.)

In the first few truth tables in this chapter, "True" and "False" are spelled out, but it is more often the case that these words are abbreviated to their first letters, "T" and "F" in truth tables.

Example 2 - Negation in Python

The code below prints the truth table for negation. Note that the values True and False are constants in Python, and that not p implements the negation $\neg p$ in Python.

Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

Edit in PythonTutor

5.2.2. Conjunction

"I am a rock and I am an island."

Let $p$ and $q$ be propositions. The conjunction of $p$ and $q$, denoted in mathematics by $p \land q$ and read as "$p$ and $q$", is True when both $p$ and $q$ are True and is False otherwise.

\(p\)	\(q\)	\(p \land q\)
True	True	True
True	False	False
False	True	False
False	False	False

$p$

$q$

$p \land q$

True

False

True

False

Notice that the two propositions $p \land q$ and $q \land p$ always have the same truth value.

Example 3 - Conjunction in Python

The code below prints the truth table for conjunction. Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

Edit in PythonTutor

5.2.3. Disjunction

"They studied hard or they are extremely bright."

Let $p$ and $q$ be propositions. The disjunction of $p$ and $q$, denoted in mathematics by $p \lor q$ and read as "$p$ or $q$", is True when at least one of $p$ and $q$ are True and is False otherwise.

\(p\)	\(q\)	\(p \lor q\)
True	True	True
True	False	True
False	True	True
False	False	False

$p$

$q$

$p \lor q$

True

False

True

False

True

False

Notice that the two propositions $p \lor q$ and $q \lor p$ always have the same truth value.

Example 4 - Disjunction in Python

The code below prints the truth table for disjunction. Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

Edit in PythonTutor

5.2.4. Conditional

"If you get a 100 on the final exam, then you earn an A in the class."

Let $p$ and $q$ be propositions. The conditional statement $p \rightarrow q$, read as "if p then q", "p implies q", or, more formally, "the conditional with hypothesis p and conclusion q", is the proposition that is False when p is True and q is False, and True otherwise. The conditional statement $p \rightarrow q$ is also called "the implication $p \rightarrow q$".
The conditional $p \rightarrow q$ can also be denoted by $p \Rightarrow q$ or $p \implies q.$ In addition, there are many other ways to express the conditional $p \rightarrow q$ in English, two of which are "p only if q" and "q if p".

\(p\)	\(q\)	\(p \rightarrow q\)
True	True	True
True	False	False
False	True	True
False	False	True

$p$

$q$

$p \rightarrow q$

True

False

True

False

True

$p \rightarrow q$ and $q \rightarrow p$ do NOT always have the same truth value!

The conditional can be considered a "contract" which fails only when the conditions are met and the results are not fulfilled.

The conditional may or may not represent a "cause-and-effect" relationship. For example, the conditional "if Shakespeare wrote Hamlet then $2 + 2 = 4$" is a True proposition because the conclusion "$2 + 2 = 4$" is True, but the arithmetic equation is not an effect that was caused by the authorship of Hamlet.

Example 5 - You Try: Conditional in Python

Complete the code below by clicking one of the "edit" links then replacing $#FIX ME#$ with an expression involving $p$, $q$, and some of the Python operators not, and, and or. Once correctly defined, the correct truth table for the conditional statement should print.

Edit in PythonTutor

The Converse, Contrapositive and Inverse of a Conditional Statement

Given propositions p and q, we can form three additional compound propositions that are related to the conditional $p \rightarrow q$:

$q \rightarrow p$, called the converse of $p \rightarrow q$
$ \neg q \rightarrow \neg p$, called the contrapositive of $p \rightarrow q$
$ \neg p \rightarrow \neg q$, called the inverse of $p \rightarrow q$

The extended truth table for the conditional and the three related propositions is shown below.

\(p\)	\(q\)	\(p \rightarrow q\) (conditional)	\(q \rightarrow p \) (converse)	\( \neg q \rightarrow \neg p\) (contrapositive)	\( \neg p \rightarrow \neg q\) (inverse)
True	True	True	True	True	True
True	False	False	True	False	True
False	True	True	False	True	False
False	False	True	True	True	True

$p$

$q$

$p \rightarrow q$ (conditional)

$q \rightarrow p $ (converse)

$ \neg q \rightarrow \neg p$ (contrapositive)

$ \neg p \rightarrow \neg q$ (inverse)

True

False

True

False

True

False

True

False

True

False

True

From the truth table it can be seen that

$p \rightarrow q$ and the converse $q \rightarrow p$ do NOT always have the same truth value!
$p \rightarrow q$ and its contrapositive $ \neg q \rightarrow \neg p$ MUST have the same truth value!
The converse $q \rightarrow p$ and the inverse $ \neg p \rightarrow \neg q$ MUST have the same truth value.

In the section on logically equivalent propositions we will discuss the bullet points in the preceding note in more detail.

The next example illustrates these four propositions.

Example 6 - Conditional, Converse, Contrapositive and Inverse.

Translate the statement "If the number of students in class is divisible by 4, then the number of students in class is divisible by 2" using a conditional.
Form and translate the converse, contrapositive, and inverse.

Solution

Let

$p$ be the proposition "The number of students in class is divisible by 4."

$q$ be the proposition "The number of students in class is divisible by 2."

The conditional $p\rightarrow q$ translates as "If the number of students in class is divisible by 4, then the number of students in class is divisible by 2."
The converse $q \rightarrow p$ may be translated as "If the number of students in class is divisible by 2, then the number of students in class is divisible by 4."

The contrapositive $ \neg q \rightarrow \neg p$ may be translated as "If the number of students in class is not divisible by 2, then the number of students in class is not divisible by 4."

The inverse $ \neg p \rightarrow \neg q$ may be translated as "If the number of students in class is not divisible by 4, then the number of students in class is not divisible by 2."

Notice that in this example, the conditional must be True, based on properties of factors of integers, but its converse could be False: Consider the case where the number of students in class is equal to 26, so $p$ is False and $q$ is True, and $p\rightarrow q$ is True but $q\rightarrow p$ is False. This also shows, again, that the conditional need not represent a "cause-and-effect" relationship, since NOT being "divisible by 4" does not let us conclude anything about being "divisible by 2".

5.2.5. Biconditional

"It is raining outside if and only if it is a cloudy day."

Let $p$ and $q$ be propositions. The biconditional $p \leftrightarrow q$, read as "p if and only if q", is the proposition that is True when p and q have the same truth value, and False otherwise. The biconditional is also called "the bi-implication". Note that $p \leftrightarrow q$ can also be denoted by $p \Leftrightarrow q$ or $p \iff q.$

\(p\)	\(q\)	\(p \leftrightarrow q\)
True	True	True
True	False	False
False	True	False
False	False	True

$p$

$q$

$p \leftrightarrow q$

True

False

True

False

True

You can use a truth table to show that two propositions $p \leftrightarrow q$ and $q \leftrightarrow p$ always have the same truth value.

The biconditional $p \leftrightarrow q$ is read as "p if and only if q" because it has the same truth table as the conjunction of the two conditionals "p if q" (that is, $q \rightarrow p$) and "p only if q" (that is, $p \rightarrow q$).

Example 7 - You Try: Biconditional in Python

Edit in PythonTutor

It is important to contrast the conditional with the biconditional. Consider the conditional example "If you get a 100 on the final exam, then you earn an A in the class." This means that when you get a 100 on the final you also get an A in the class. The conditional represents a one-way contract: You earn an A in the class if you get a 100 on the final exam. There is nothing said about the result (the grade you earn in the class) if you do NOT meet the condition (get a 100 on the final exam).

As a biconditional the example would say "You get a 100 on the final exam if and only if you earn an A in the class." This becomes a two-way contract: You earn an A in the class if you get a 100 on the final, but you do not earn an A in the class if you do not get a 100 on the final.

5.2.6. Other Compound Propositions

The negation, disjunction, conjunction, conditional, and biconditional are the most commonly-used logical operators for forming compound propostions and will be the ones used throughout the rest of this chapter. However, there are at least three others you should know about.

Exclusive Disjunction

"I took either 2 Advil or 2 Tylenol."

Let $p$ and $q$ be propositions. The exclusive disjunction of $p$ and $q$ (also known as xor), denoted in mathematics by $p \oplus q$, is True when exactly one of $p$ and $q$ are True and False otherwise.

\(p\)	\(q\)	\(p \oplus q\)
True	True	False
True	False	True
False	True	True
False	False	False

$p$

$q$

$p \oplus q$

True

False

True

False

True

False

True

False

Notice that the two propositions $p \oplus q$ and $q \oplus p$ always have the same truth value.

The NAND and NOR Operators

The NAND and NOR operators correspond to two important digital logic gates used in electronic devices.

NAND

"An onion is not both a fruit and a vegetable."

Let $p$ and $q$ be propositions. In this textbook, the NAND of $p$ and $q$ is denoted by $p \uparrow q$ and is False when both $p$ and $q$ are True and is True otherwise. That is, the NAND is the negation of $p \land q$ - think of NAND as "Not AND."

\(p\)	\(q\)	\(p \uparrow q\)
True	True	False
True	False	True
False	True	True
False	False	True

$p$

$q$

$p \uparrow q$

True

False

True

False

True

False

True

False

True

Notice that the two propositions $p \uparrow q$ and $q \uparrow p$ always have the same truth value.

NOR

"This pen’s ink is neither red nor blue."

Let $p$ and $q$ be propositions. In this textbook, the NOR of $p$ and $q$ is denoted by $p \downarrow q$ and is True when both of $p$ and $q$ are False and is False otherwise. The NOR is also referred to as "joint denial" since it is True exactly when neither $p$ nor $q$ is True. That is, the NOR is the negation of $p \lor q.$

\(p\)	\(q\)	\(p \downarrow q\)
True	True	False
True	False	False
False	True	False
False	False	True

$p$

$q$

$p \downarrow q$

True

False

True

False

True

False

True

Notice that the two propositions $p \downarrow q$ and $q \downarrow p$ always have the same truth value.

5.2.7. Well-Formed Formulae and Operator Precedence (Order Of Operations)

A well-formed formula (or wff for short) is a string of symbols that represents a compound propsition.

Here is a recursive definition of wff:

A propositional variable is a well-formed formula.
If $\alpha$ ("alpha") and $\beta$ ("beta") are well-formed formulas, then the following are also well-formed formulas:
- $\left( \neg \alpha \right)$
- $\left( \alpha \land \beta \right)$
- $\left( \alpha \lor \beta \right)$
- $\left( \alpha \rightarrow \beta \right)$
- $\left( \alpha \leftrightarrow \beta \right)$

The definition of wff allows you or, even better, a computer to analyze any string of symbols to determine whether the string of symbols is a wff. For example, $(p \land q \lor r)$ isn’t a wff but both $(p \land (q \lor r))$ and $((p \land q) \lor r)$ are wffs. You could write code to implement an algorithm to validate a string as a wff.

It can be shown that every compound proposition can be represented by at least one wff. However, a wff may be difficult to read quickly if it contains many parentheses. As an example, it is not easy to read $( (p \rightarrow q) \lor ( (\neg r) \land (s \leftrightarrow t) ) )$. For this reason, we can introduce operator precedence rules that allow us to eliminate some of the parentheses.

To evaluate a compound proposition, we start by evaluating

all expressions enclosed in parentheses from left to right, then
all negations from left to right, then
all conjunctions from left to right, then
all disjunctions from left to right, then
all conditionals from left to right, and finally
all biconditionals from left to right.

This allows us to drop some parentheses from a wff that represents a compound proposition.

For example, the compound proposition $\neg p \lor q \land r \rightarrow s$ represents the same proposition as the wff $(((\neg p) \lor (q \land r)) \rightarrow s)$. At least some of the parentheses must be used if you want to represent a different proposition such as $(\neg p \lor q) \land (r \rightarrow s)$.

5.2.8. Truth Tables of Compound Propositions

To compute the truth values of a longer compound proposition or wff by hand, it can be useful to break up the proposition or wff into the smaller propositions or wffs that it was built from.

Example 8

The code below reveals the truth table of the compound proposition:

$(p \land q) \lor \neg q$

Recall: $\neg q$ is mathematical shorthand for not q.

Edit in PythonTutor

You Try

Edit the code above to reveal the truth value of the compound proposition:

$(p \lor \neg q) \land \neg p$

Hint: You only need to change line 10.

When creating your own truth table it is crucial to be systematic about ensuring you have all possible truth values for each of the simple propositions. Each simple proposition has two possible truth values, so the number of rows in the table should be $2^n$ where $n$ is the number of propositions (Do you recall why the number of rows must be $2^n$?) You should also consider breaking complex propositions into smaller pieces.

Example 9

Create a truth table for the compound proposition:

$(p \land q) \rightarrow (p \land r)$ for all values of $p, q, r$.

Solution

It should have 8 rows - since there are three simple propositions and each one has two possible truth values.

\(p\)	\(q\)	\(r\)	\(p \land q\)	\(p \land r\)	\((p \land q) \rightarrow (p \land r)\)
T	T	T	T	T	T
T	T	F	T	F	F
T	F	T	F	T	T
T	F	F	F	F	T
F	T	T	F	F	T
F	T	F	F	F	T
F	F	T	F	F	T
F	F	F	F	F	T

$p$

$q$

$r$

$p \land q$

$p \land r$

$(p \land q) \rightarrow (p \land r)$

5.3. Logically Equivalent Propositions

Recall that an interpretation of a proposition is an assignment of truth values to the propositional variables.

Two propositions are considered logically equivalent (or simply equivalent) if they have the same truth values for every possible interpretation. It is often easiest to see this by constructing a truth table for the two propositions and comparing.

Example 10

Consider the propositions $\neg p \lor q$ versus $p\rightarrow q$.

\(p\)	\(q\)	\(\neg p \lor q\)	\(p \rightarrow q\)
True	True	True	True
True	False	False	False
False	True	True	True
False	False	True	True

$p$

$q$

$\neg p \lor q$

$p \rightarrow q$

True

False

True

False

True

Since the truth table in all rows is the same for the two compound propositions, they are equivalent.

We use the symbol $\equiv$ to denote that two propositions are logically equivalent. So in the preceding example, we would write $\neg p \lor q \equiv p\rightarrow q$.

$\equiv$ is NOT a logical operator used to build compound propositions, but instead is used to say that two propositions are logically equivalent. This is similar to how $=$ is used in arithmetic: We can write $2 + 2 = 5 - 1$ to say that $2 + 2$ and $5 - 1$ are numerically equivalent, but we don’t use the $=$ sign as an arithmetic operator to actually do any arithmetic.

Saying that two propositions p and q are logically equivalent is the same as saying that the biconditional compound proposition $p \leftrightarrow q$ is always True.

Example 11

Consider three compound propositions:

$(p\land q) \rightarrow r$
$(p \rightarrow q) \land (p \rightarrow r)$
$p \rightarrow (q \land r)$

The code below reveals the truth table for 1. Modify it for 2 and 3 in order to determine which set of compound propositions are equivalent.

Hint: You only need to change line 11.

Edit in PythonTutor

5.3.1. Tautologies, Contradictions and Contingencies

A proposition is a tautology if its truth value is always True. That is, a tautology is True for every possible interpretation of its propositional variables.

A proposition is called satisfiable if there is at least one interpretation for which the proposition is True.

A proposition is unsatisfiable if there is no interpretation for which the proposition is True.

A proposition is a contradiction if its truth value is always False. That is, a contradiction is False for every possible interpretation of its propositional variables. This is just another way of saying that it is unsatisfiable.

A proposition that is neither a tautology nor a contradiction is said to be a contingency since its truth value can be either True or False, contingent on the truth value assigned to its propositional variables.

Example 12 - Tautology and Contradiction

$p \lor \neg p$ is an example of a tautology.

$p \land \neg p$ is an example of a contradiction.

This can be seen in the truth table.

\(p\)	\(\neg p\)	\( p \lor \neg p\)	\(p \land \neg p\)
True	False	True	False
False	True	True	False

$p$

$\neg p$

$ p \lor \neg p$

$p \land \neg p$

True

False

True

False

True

False

Notice that the truth values for $p \lor \neg p$ are all True and $p \land \neg p$ are all False.

The two compound propositions in the previous example are so important that they have their own names,

The Law of Excluded Middle

Given any proposition $p$, the compound proposition \[p \lor \neg p\] is a tautology (that is, the compound proposition is always True.)

The Law of Contradiction

Given any proposition $p$, the compound proposition \[p \land \neg p\] is a contradiction (that is, the compound proposition is always False.)

5.3.2. De Morgan’s Laws

Two important logical equivalences are De Morgan’s Law. These describe how to "distribute" the $\neg$ operator across the $\land$ and $\lor$ operators.

De Morgan’s Laws

$\neg (p \land q)\equiv \neg p \lor \neg q$

$\neg (p \lor q)\equiv \neg p \land \neg q$

De Morgan’s Laws can be verified by creating truth tables for $\neg (p \land q) \leftrightarrow \neg p \lor \neg q$ and $\neg (p \lor q) \leftrightarrow \neg p \land \neg q$ to show that these propositions are True for every interpretation of $p$ and $q$.

5.3.3. Some Other Logical Equivalencies

Here is a collection of additional equivalencies of compound propositions. Each of these can be verified by constructing a truth table to show that the biconditional of the left-hand side and the right-hand side of the logical equivalence is true for all interpretations of the propositional variables.

Double Negation: \[ p \equiv \neg (\neg p) \]

Commutative laws: \[ p \lor q \equiv q \lor p \] \[ p \land q \equiv q \land p \]

Associative laws: \[ p \lor (q \lor r) \equiv (p \lor q) \lor r \] \[ p \land (q \land r) \equiv (p \land q) \land r \]

Distributive laws: \[ p \lor (q \land r) \equiv (p \lor q) \land (p \lor r) \] \[ p \land (q \lor r) \equiv (p \land q) \lor (p \land r) \]

5.3.4. Disjunctive Normal Form (DNF)

It is traditional to focus on negation $\neg$, conjunction $\land$, and disjunction $\lor$ as the three primary logical operations. This is because any compound proposition can be rewritten in terms of these three operations and the propositional variables present in the original compound proposition.

One way to justify this is by using an expression in disjunctive normal form (DNF), which is a disjunction of one or more conjunctions, where only one of the conjunctions can be true for any interpretation of the propositional variables. This description should become clearer after reading the following example.

Example 13 - Finding A Logically Equivalent Proposition (DNF) From A Truth Table

Suppose we have a truth table for an unknown compound proposition. Perhaps someone wrote the truth table but did not write down the expression for the compound proposition in the header of the rightmost column.

\(p\)	\(q\)	\(r\)	\(\text{unknown}\)
T	T	T	F
T	T	F	F
T	F	T	T
T	F	F	F
F	T	T	F
F	T	F	T
F	F	T	T
F	F	F	F

$p$

$q$

$r$

$\text{unknown}$

We can write a new compound proposition that is equivalent to the unknown one, using the propositional variables $p$, $q$, and $r$ and the logical operators $\neg$, $\land$ and $\lor$ as follows:

For each row of the truth table that has T in the rightmost column, write the conjunction that would have a T in only that one row of its truth table.
Form the disjunction of all the conjunctions found in the previous step. This new expression is called a disjunctive normal form (DNF) for the unknown proposition.

For the truth table above, we have three rows with T in the rightmost column. The first of the three rows corresponds to $p \land \neg q \land r$, which is only True if $p$ is True, $q$ is False, and $r$ is True. In the same way, the second of the three rows corresponds to $\neg p \land q \land \neg r$, and the third of the three rows corresponds to $\neg p \land \neg q \land r$. We now form the disjunction of these three expressions. \[(p \land \neg q \land r) \lor (\neg p \land q \land \neg r) \lor (\neg p \land \neg q \land r)\]

This new compound proposition has a truth table that is the same as the one for the unknown proposition. This means that the expression we found is logically equivalent to the unknown proposition.

\(p\)	\(q\)	\(r\)	\(\text{unknown}\)	\((p \land \neg q \land r) \lor (\neg p \land q \land \neg r) \lor (\neg p \land \neg q \land r)\)
T	T	T	F	F
T	T	F	F	F
T	F	T	T	T
T	F	F	F	F
F	T	T	F	F
F	T	F	T	T
F	F	T	T	T
F	F	F	F	F

$p$

$q$

$r$

$\text{unknown}$

$(p \land \neg q \land r) \lor (\neg p \land q \land \neg r) \lor (\neg p \land \neg q \land r)$

5.3.5. Conjunctive Normal Form (CNF)

In some applications of propositional logic, it is more useful to find a logically equivalent expression for a given proposition that is written as a conjunction of several disjunctions. This conjunctive normal form (CNF) can be constructed as shown in the following example.

Example 14 - Finding A Logically Equivalent Proposition (CNF) From A Truth Table

Consider the same unknown proposition we used in the previous example.

\(p\)	\(q\)	\(r\)	\(\text{unknown}\)
T	T	T	F
T	T	F	F
T	F	T	T
T	F	F	F
F	T	T	F
F	T	F	T
F	F	T	T
F	F	F	F

$p$

$q$

$r$

$\text{unknown}$

One way to find a CNF is as follows.

Find the disjunctive normal form for the negation of the unknown proposition.
Apply De Morgan’s Laws to the DNF for the negation of the unknown proposition found in the first step - the result will be a CNF for the double negation of the unknown proposition (which is logically equivalent to the unknown proposition).

\(p\)	\(q\)	\(r\)	\(\text{unknown}\)	\(\neg \text{unknown}\)
T	T	T	F	T
T	T	F	F	T
T	F	T	T	F
T	F	F	F	T
F	T	T	F	T
F	T	F	T	F
F	F	T	T	F
F	F	F	F	T

$p$

$q$

$r$

$\text{unknown}$

$\neg \text{unknown}$

From the truth table above, we obtain the following DNF for the negation of the unknown proposition: $(p \land q \land r) \lor (p \land q \land \neg r) \lor (p \land \neg q \land \neg r) \lor (\neg p \land q \land r) \lor (\neg p \land \neg q \land \neg r)$.

Next, we negate the DNF, using De Morgan’s Laws, and simplify the resulting expression \[\neg [ (p \land q \land r) \lor (p \land q \land \neg r) \lor (p \land \neg q \land \neg r) \lor (\neg p \land q \land r) \lor (\neg p \land \neg q \land \neg r) ],\] which simplifies to the CNF we wanted to find, \[(\neg p \lor \neg q \lor \neg r) \land (\neg p \lor \neg q \lor r) \land (\neg p \lor q \lor r) \land (p \lor \neg q \lor \neg r) \land (p \lor q \lor r).\]

The last expression is logically equivalent to the unknown proposition.

Here is a website that allows you to build the DNF and CNF for a given propositional function.

5.3.6. Functional Completeness

A set $S$ of logical operators is called functionally complete if every compound proposition is logically equivalent to a compound proposition involving only operators that are members of $S.$

Theorem

The set $\{ \neg , \land , \lor \}$ is functionally complete.

Proof

An informal justification can use the method shown in the previous examples for finding a DNF or CNF. A formal proof requires the mathematical induction proof technique and the recursive defintion of well-formed formulae given earlier in this chapter.

The importance of the NAND and NOR in electronic circuits arises from the following theorem.

Theorem

The set $\{ \uparrow \}$ is functionally complete. That is, any compound proposition is logically equivalent to a compound proposition involving the same variables and only the $\uparrow$ operator.
The set $\{ \downarrow \}$ is functionally complete. That is, any compound proposition is logically equivalent to a compound proposition involving the same variables and only the $\downarrow$ operator.

Proof

These proofs are exercises for you. See the Challenge Exercises at the end of this chapter.

5.4. Predicates and Quantifiers

Up to this point, most of our propositions have been of the form "Sarah earned a B.S. in Computer Science" - the proposition describes a single individual constant (in this case, "Sarah.")

However, we often need to discuss an entire category of individuals at once, which is equivalent to replacing the constant "Sarah" by a variable. We will discuss this idea in this section.

5.4.1. Predicates

A predicate is a statement that includes one or more variables such that when values are assigned to the variables the predicate becomes a proposition.

Example 15 - Predicates

$x \leq 3$
Computer $c$ is infected.
Country $x$ is on continent $y$.

Predicates are denoted as $P(x)$ or $Q(x,y)$ where $P$ and $Q$ represent the statements and $x$ and $y$ are variables. After a value is assigned to each variable, the predicate becomes a proposition which has a truth value. That is, we "evaluate" a predicate by substituting inputs into the variables and get a proposition as the output.

Example 16

Let $P(x)$ be the predicate $x \leq 3$.

What are the propositions $P(5)$ and $P(2)$? What are the truth values of $P(5)$ and $P(2)$?

Edit in PythonTutor

Example 17

Let $P(x)$ be the predicate "The sum of the first $n$ positive odd integers is equal to $n^{2}$."

What are the propositions $P(1{\small,}000)$ and $P(1{\small,}000{\small,}000)$ ? Notice that code correctly outputs the two propositions as strings (of type str in Python). The predicate does not tell us whether the propositions it outputs are True or False.

Edit in PythonTutor

Example 18

Let $Q(x,y)$ be the statement $x-y=4$.

The Python code displays each of the three propositions $Q(6,2)$, $Q(1,5)$, and $Q(-2,2)$ and describes their truth values.

Edit in PythonTutor

5.4.2. Quantifiers

Consider the statements

For all integers $x$, $x^2\geq 0$.
Some student in the class has a birthday in July.

Each of these statements considers a proposition over an entire population or set, called the domain, and describes whether at least one element, or all of the elements in the domain satisfy the proposition. There are two commonly-used quantifiers, the universal quantifier and the existential quantifier.
The domain is also called the domain of discourse or the universe of discourse.

The Universal Quantifier, $\forall,$ represents the statement "for all", "for every", "for each". When it comes before a statement, it means that statement is true for all values in the domain.

Example 19

Universal Quantifier $\forall x, x + 1 \gt x$

Let $P(x)$ be the statement $x + 1 \gt x$. Is this true for all integers x?

Edit in PythonTutor

We use the example domain [-2, -1, 0, 1, 2] because code can not check all integers.

Example 20

Universal Quantifier $\forall x, x + x \gt x$

Let $P(x)$ be the statement $x + x \gt x$. Is this true for all integers x?

Edit in PythonTutor

The Existential Quantifier, $\exists,$ represents the statement "there exists", "for some", "at least one". When it comes before a statement, it means the statement is true for at least one value in the domain.

Example 21

Existential Quantifier $\exists x, x^2 = 4$

Let $P(x)$ be the statement $x^2 = 4$. Is this true for at least one integer x?

Edit in PythonTutor

Example 22

Existential Quantifier $\exists x, x^3 = 4$

Let $P(x)$ be the statement $x^3 = 4$. Is this true for at least one integer x?

Edit in PythonTutor

Again, we use the example domain [-2, -1, 0, 1, 2] because code can not check all integers.

Recall the previous example statements:

For all integers $x$, $x^2 \geq 0$.

Let $P(x)$ be the predicate "$x^2 \geq 0$". Then we write the statement as $\forall x P(x)$, where the domain is the set of all integers. This quantified statement will be true since anytime you square a nonzero integer it is positive and $0^2=0$.

Some student in the class has a birthday in July.

Let $Q(s)$ be the predicate "student $s$ has a birthday in July". Then we write the statement as $\exists s Q(s)$, where the domain is the set of all students in the class. This statement will be true as long as at least one student in the class has a birthday in July. It will be false, otherwise.

5.4.3. Negation of Quantifiers

It is important to consider the negation of a quantified expression.

"Every student in this class has taken Programming Fundamentals."

This is a universally quantified statement and can be expressed as $\forall x P(x)$ where $P(x)$ is the statement "$x$ has taken Programming Fundamentals" and the domain consists of all the students in this class. The negation of the statement would be "It is not true that every student in this has taken Programming Fundamentals." Equivalently,

"There is a student in this class who has NOT taken Programming Fundamentals."

This is an existentially quantified statement expressed as $\exists x \neg P(x)$.

This demonstrates that the negation of a universally quantified statement is an existential statement. In symbols, we have $\neg \forall x P(x)\equiv \exists x \neg P(x)$.

Similarly, the negation of an existential statement is a universal statement. $\neg \exists x P(x) \equiv \forall x \neg P(x)$.

De Morgan’s Laws with Quantifiers

For any predicate $P(x)$ \[\neg \forall x P(x)\equiv \exists x \neg P(x)\] and \[\neg \exists x P(x) \equiv \forall x \neg P(x)\]

Example 23

Someone in the class can speak Latin.

Using quantifiers, we write this statement as $\exists x L(x)$ where $L(x)$ is the proposition "$x$ speaks Latin." and the domain is the students in the class. Its negation would be $\forall x \neg L(x)$.

All the students in the class can not speak Latin.

You Try

Find the negation of the statement "For all integers $x$, $x^2 \geq x$."

The predicate of a quantified statement could be a compound statement. For instance,

Some dogs are big and fluffy.

This is written as $\exists x (B(x) \land F(x))$ where $B(x)$ is the proposition "$x$ is big." and $F(x)$ is the proposition "$x$ is fluffy." and the domain is dogs. Negating this statement would give

$\neg \exists x (B(x) \land F(x)) \equiv \forall x \neg (B(x) \land F(x)) \equiv \forall x (\neg B(x) \lor \neg F(x))$

In words,

All dogs are not big or not fluffy.

5.4.4. Nested Quantifiers

There are times it will take more than one quantifier to express a statement.

For all integers $x$, there exists an integer $y$, such that $x+y=0$.

This statement contains both a universal and an existential quantifier. $\forall x \exists y S(x,y)$ where $x$ and $y$ are integers and $S(x,y)$ is the proposition $x+y=0$. This statement means, if you have any integer $x$ (for instance $x=5$) then you can find an integer $y$ (for instance $y=-5$) such that $x+y=0$.

The order of the quantifiers matters. $\exists x \forall y S(x,y)$ would be

There exists an integer $x$, such that for all integers $y$, $x+y=0$.

Note that in this statement you find an integer $x$ so that when you add any integer $y$ to it you always get 0.

The first statement, for all integers $x$, there exists an integer $y$ such that $x+y=0$, is true. For any integer $x$ you could choose $y=-x$ and $x+y=x+(-x)=0$. While the second statement, there exists an integer $x$, such that for all integers $y$, $x+y=0$, is false.

Example 24

Let $Q(x,y)$ be the statement $xy=0$. If the domain for both variables consists of all integers, what are the truth values of the following statements?

$Q(0,3)$ is True since $0\cdot 3=0$
$Q(6,2)$ is False since $6\cdot 2=12$
$\exists x Q(x,4)$ is True. Use the value of $x=0$, and since $0\cdot 4=0$ there is at least one integer $x$ so that $x\cdot 4=0$.
$\forall x \exists y Q(x,y)$ is True. If you have any integer $x$, you can pick the value $y=0$ and get $x\cdot 0=0$.

You Try - Determine the truth value of each statement and justify the answer.

$\forall y Q(1,y)$
$\exists x \forall y Q(x,y)$
$\forall x \forall y Q(x,y)$

To negate nested quantifiers, repeatedly apply De Morgan’s Laws of negating a quantifier and a predicate.

Namely, $\neg \forall x P(x) \equiv \exists x \neg P(x)$ and $\neg \exists x P(x) \equiv \forall x \neg P(x)$.

Example 25 - Negation of quantified statements

Find the negation of the statment "For all integers $x$, there exists an integer $y$ such that $x=-y$."

Solution

Using quantifiers, we write this statement as $\forall x \exists y N(x,y)$ where $N(x,y)$ is the proposition "$x=-y$." and the domain of $x$ and $y$ is the integers. Its negation would be $\exists x \forall y \neg N(x,y)$.

There exists an integer $x$, such that for all integers $y$, $x \neq -y$.

You Try

Find the negation of the statement "Some student in the class will solve every practice problem."

Hint: Let $x$ be a student in the class, $y$ be a practice problem, and $P(x,y)$ be the statement "student $x$ has solved practice problem $y$".

5.5. Applications of Logic

Remixer’s Note: This section is taken from the original “Discrete Math” book with minor edits to include base-two notation and a link to the NANDgame website.

In this section we consider two applications of logic to information technology and computer science. The first involves bitwise operations, and the second designing and analyzing logic circuits.

5.5.1. Bitwise operations

A bitwise operation is a Boolean operation that operates on the individual bits ($0s$, or $1s$) of the operand(s) and are summarized

Bitwise Operations

The bitwise AND, denoted by "&", applies the and $\land$ to the corresponding bits of each operand.
The bitwise OR, denoted by "$|$", applies the or $\lor$ to the corresponding bits of each operand.
The bitwise XOR, denoted by "${}^{\wedge}$", applies the disjunctive or $\oplus $ to the corresponding bits of each operand.
The bitwise NOT, denoted by "!", applies the negation $¬$ (flips $0\longleftrightarrow 1$ ), to the corresponding bits of each operand.

We summarize the truth tables for the bitwise boolean operators.

$p$	$q$	$AND$ &	$ \ OR\ \| $	$XOR$ ${}^{\wedge}$	$IF$ $\Rightarrow$	$IFF$ $\Leftrightarrow$
1	1	1	1	0	1	1
1	0	0	1	1	0	0
0	1	0	1	1	1	0
0	0	0	0	0	1	1

Example 26 - Bitwise Operations

Find the bitwise $AND, OR, XOR$ for the following binary numbers,

\[ A = 111101\] \[ B = 001111\]

Solution

Using the truth tables for Boolean operators, where the results are noted in the bottom row, we have

Bitwise AND	Bitwise OR	Bitwise XOR
111101	111101	111101
001111	001111	001111
001101	111111	110010

5.5.2. Logic Circuits

Logic circuits are important in designing the arithmetic and logic units of a computer processor.

Consider the problem of adding two $8$-bit numbers in binary. In binary, $(0)_2 + (0)_2 = (0)_2$ and $(0)_2 + (1)_2 = (1)_2 + (0)_2 = (1)_2$, but as in decimal addition, $(1)_2 + (1)_2 = (10)_2$ requires a carry, that is, the "sum bit" is $0$ with a "carry bit" of $1$ to the next significant column on the left.
Note: The bits are enclosed in parentheses which are followed by the subscript $_2$ to emphasize that binary notation, not decimal notation, is being used. This notation is covered in much more detail in the Number Bases chapter.

Thinking then of adding a specific column of two binary digits, say $A$ and $B$, involves as input the bits $A, B$ and the carry in from the previous column say $C_{in}$. The output will be the sum $S$ and the carry out to the next column, say $C_{out}$. These are the basic components of what is called a binary adder.

Figure 8. A Binary adder

The logic table for binary addition based on the digital inputs $A, B, C_{in}$, and digital outputs $S$ and $C_{out}$ is summarized in the table.

Table 1. Truth table for Binary adder
$A$	$B$	$C_{in}$	$\mathbf{S}$	$\mathbf{C_{out}}$
1	1	1	$\mathbf{1}$	$\mathbf{1}$
1	1	0	$\mathbf{0}$	$\mathbf{1}$
1	0	1	$\mathbf{0}$	$\mathbf{1}$
1	0	0	$\mathbf{1}$	$\mathbf{0}$
0	1	1	$\mathbf{0}$	$\mathbf{1}$
0	1	0	$\mathbf{1}$	$\mathbf{0}$
0	0	1	$\mathbf{1}$	$\mathbf{0}$
0	0	0	$\mathbf{0}$	$\mathbf{0}$

You can write logically equivalent expressions for $S$ and $C_{out}$ by first writing expressions in Disjucntive Normal Form and then simplifying those expressions using some of the other logical equivalences listed earlier in the chapter: \[S \equiv \left(\sim A\land \sim B\land C_{in}\right)\lor \left(\sim A\land B\land \sim C_{in}\right)\lor \left(A\land \sim B\land \sim C_{in}\right)\lor \left(A\land B\land C_{in}\right) \] \[ C_{out} \equiv (A\land B)\lor \left(B\land C_{in}\right)\lor \left(A\land C_{in}\right)\]

Implementing these logical outputs based on the inputs $(A,B, C_{in})$, is through the use of electronic circuits called logic gates.

The basic logic gates, are the Inverter or Not gate, the And gate, the Or gate and the Xor gate. The graphical representation for each is shown below.

Figure 9. Basic gates

We end this section by first analyzing logic circuits to give their outputs in terms of their input variables, and then, constructing logic circuits based on logical statements.

Example 27 - Output of a logic circuit in terms Input

Determine the output of the following logic circuit in terms of the input variables, $p, q$, and $r$.

Solution

Proceeding left to right, determine the output of the leftmost gates first using the basic gate outputs.

The output of the logic circuit is $ ( p \lor q)\land ( \neg p \lor \neg q)$

In the next two examples, we design logic circuits based on logical propositions. The idea is to work backward using order of operations from the right to the left.

Example 28 - Design a Logic Circuit

Design a logic circuit for $(p\vee\lnot\ q)\land\lnot\ p$.

Solution

Working backwards from right to left we have the following sequence of gates

1) An AND gate $(p\vee\lnot\ q)\underline{\land} \lnot\ p$.

2) The inputs to the AND gate are $(p\vee\lnot\ q)$ and $\lnot\ p$.

3) These inputs come from the output of an INVERTER, for $\underline{\lnot}\ p$ and an OR gate $(p \underline{\vee}\lnot\ q)$.

4) There are two inputs to the OR gate $(p \underline{\vee}\lnot\ q)$, being $p$, and the output of an INVERTER, $\underline{\lnot} q$.

Putting these now in left to right order we obtain the following logic circuit.

Example 29 - Design a Logic Circuit

Design a logic circuit for $r\land (p\lor (r\land \neg q))$.

Solution

Working backwards from right to left we have the following sequence of gates

1) An AND gate $r\underline{\land} (p\lor (r\land \neg q))$.

2) The inputs to the AND gate are $r$ and $p\lor (r\land \neg q)$.

3) The input, $p\lor (r\land \neg q)$, comes from the output of an OR gate for $p \underline{\lor} (r\land \neg q)$.

4) The inputs to the OR gate, $p \underline{\lor} (r\land \neg q)$, are $p$ and $(r\land \neg q)$, which is an AND gate.

5) The inputs to the AND, gate, $r \underline{\land} \neg q$, are $r$ and the output of an INVERTER, $\underline{\neg} q$.

Putting these now in left to right order we obtain the following logic circuit.

How about a game of nand?

Here is a link to a website that lets you build a computer, starting from the most basic level of the NAND component.

5.6. Exercises

Remixer’s Note: This section is taken from the original “Discrete Math” book with no changes.

Which of these statements are propositions? Explain your reasoning
1. Is Atlanta the capital of Georgia?
2. All birds fly
3. $2\ \times\ \ 3\ =\ 5$
4. $5\ +\ 7\ =\ 7+5$
5. $x\ +\ 2\ =\ 11$
6. Answer this question.
7. The rain in Spain
Construct truth tables for,
1. $a\vee b\Rightarrow\lnot b$
2. $(a\vee\lnot b)\ \Leftrightarrow\ a$
3. $(a\Rightarrow b)\ \bigwedge\ (b\ \bigwedge\ \lnot c)$
4. $(a\ \bigvee\ b)\ \Rightarrow\ (\ \lnot c\ \bigvee\ a)$
5. $(a\ \bigvee\ b)\ \bigwedge\ (c\ \bigvee\lnot d\ )$
6. $(\lnot c\ \bigwedge\ \ b)\ \bigvee\ \ (a\Rightarrow\ \lnot d\ )$
Using truth tables, determine if each of the following is a tautology, contradiction, or neither (conditional)
1. $\neg ((a\lor b)\lor (\neg a\land \neg b))$
2. $\left(\left(a\vee b\right)\land\lnot a\right)\Rightarrow b$
3. $\left(\left(a\vee b\right)\land a\right)\Rightarrow b$
4. $p\land r)\lor (\neg p\land \neg r)$
5. $\neg ((p\lor q)\lor (\neg p\land (\neg q\lor r)))$
6. $\neg (p\land q)\lor (q\lor r)$
Using truth tables determine which of the following are equivalent
1. $\left(p\Rightarrow q\right)\Rightarrow r$,
  
  $\left(p\land\lnot q\right)\vee r,$ and
  
  $\left(p\land\lnot q\right)\land r$
2. $(a\lor b)\land c,$
  
  $(c\land a)\lor (c\land b),$ and
  
  $\neg ((\neg a\land \neg b)\lor \neg c)$
Let $C(x)$ be the statement "$x$ has visited Canada." where the domain consists of the students at GGC. Express each of the quantifications in English.
1. $\exists x C(x)$
2. $\forall x C(x)$
3. How would you determine whether each of these statements is true or false?
Determine the truth value of each of these statements if the domain for all variables, $m , n$ is the set of all integers, $\mathbb{Z}$, explaining your reasoning.
1. $\forall n:\left(n^2\geq 1\right)$
2. $\forall n:\left(n^2\geq 0\right)$
3. $\ \exists\ n:(n^2=3)$
4. $\ \exists\ m\forall\ n:(m+n=n-m)$
5. $\forall\ n\exists\ m:\ (n\cdot\ m=m)$
6. $\ \exists\ n\forall\ m:\ (n\cdot\ m=m)$
7. $\ \exists\ n\forall\ m:\ (n\cdot\ m=n)$
Consider each of the compound propositions. (i) Translate each using logical symbols and letters, stating what each letter represents, (ii) Negate each using plain English sentences, and (iii) Translate the negated statements using logical symbols and quantifiers.
1. If it snows today, then I will go skiing tomorrow.
2. Mei walks or takes the bus to class.
3. Every person in this class understands mathematical induction.
4. In every mathematics class there is some student who falls asleep during lectures.
5. There is a building on the campus of some college in the United states in which every room is painted white.
Let $p$, be the proposition ”My bicycle needs a tire replaced,” $q$, be the proposition ”I will go cycling”, and, $r$, be the proposition ”Rain is in the forecast.”
1. Express each of these compound propositions using plain English sentences.
  1. $\neg p\vee q$
  2. $\neg p\Rightarrow \neg q$
  3. $(\neg p\wedge r)\Rightarrow q$
  4. $(\neg p\wedge r)\Rightarrow q$
  5. $(\neg p\wedge q)\vee r$
2. Write these compound propositions using $p$, $q$ and, $r$ and logical connectives (including negation).
  1. If my bicycle tire does not replacement I will go cycling.
  2. My bicycle tire does not replacement, there is rain in the forecast but I will go cycling
  3. Whenever there is rain in the forecast, I do not go cycling.
  4. If there is rain in the forecast or my tire needs replacement I will not go cycling.
  5. Rain is not forecast whenever I go cycling.
  6. Rain is not forecast and my tire does not need replacement whenever I go cycling.
Design logic circuits with the following output
1. $(p\lor (q\land \neg r))\lor \neg (p\land q)$
2. $(p\lor (q\land r))\land \neg (p\land q)$
Consider the predicate $Q(x,y): x\ \cdot\ y=5$, where the domain of $x$ and $y$ is all positive real numbers $\mathbb{R}^+$, or $x,\ y\ >0$. Determine the true value of the following, an explain your reasoning.
1. $Q(1,5)$
2. $Q\left(2,\frac{5}{2}\right)$
3. $\exists\ y,\ Q\left(7,y\right)$
4. $\ \forall\ y,\ Q\left(7,y\right)$
5. $\exists\ x\ \forall\ y,\ Q\left(x,y\right)$
6. $\ \forall\ \ x\ \exists\ \ y,\ Q\left(x,y\right)$
Consider the predicate $R(x,y):\ 2x+y=0$, where the domain of $x$ and $y$ is all rational numbers, $\mathbb{Q}$. Determine the true value of the following, an explain your reasoning.
1. $R(0,0)$
2. $R(2,-1)$
3. $R\left(\frac{1}{5},-\frac{2}{5}\right)$
4. $\exists y,\ R\left(0.2,y\right)$
5. $\ \forall y,\ R\left(7,y\right)$
6. $\exists\ x\forall\ y,\ R\left(x,y\right)$
7. $\ \forall\ x\ \exists\ y,\ R\left(x,y\right)$
Calculate the bitwise $AND$, the bitwise $OR$, and the bitwise $XOR$ of the following pairs of bytes, or sequence of bytes
1. $01111111$ and $11101001$
2. $1110010111111010$ and $0101110101100011$
Give the output for each of the logic circuits in terms of the input variables,
1. The logic circuit, with input variables, $p, q$, $r$.
2. The logic circuit, with input variables, $a, b$, $c$.
Design a logic circuit for $r\land (p\lor (r\land \neg q))$.

5.7. Challenge Exercises

Complete parts (a) and (b) to show that every compound proposition is logically equivalent to one that uses the same propositional variables but only the NAND operator $\uparrow.$
1. For each of the three propositions $\neg p$, $p \land q,$ and $p \lor q,$ write a logically equivalent proposition that uses only $p,$ $q,$ and $\uparrow.$
2. Use the theorem that every compound proposition is equivalent to a compound proposition in disjunctive normal form to justify that the compound proposition is also equivalent to one that uses only the NAND operator $\uparrow.$
Complete parts (a) and (b) to show that every compound proposition is logically equivalent to one that uses the same propositional variables but only the NOR operator $\downarrow.$
1. For each of the three propositions $\neg p$, $p \land q,$ and $p \lor q,$ write a logically equivalent proposition that uses only $p,$ $q,$ and $\downarrow.$
2. Use the theorem that every compound proposition is equivalent to a compound proposition in disjunctive normal form to justify that the compound proposition is also equivalent to one that uses only the NOR operator $\downarrow.$

6. Proofs: Basic Techniques

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on October 6, 2025
Revised example of valid and invalid arguments. Introduced new example using rules of inference with predicates.
Moved explanation of converse error using Euler diagrams to later in the chapter.
Revised example 4+ Made minor wording changes and fixed typos.
NOTE: Statement that universal generalization is used in proofs by mathematical induction has been removed.

Recall from the Logic chapter that an argument is a finite sequence of statements that ends with a final statement, called the conclusion, that is based on inferences made from the earlier statements, called premises or hypotheses.

An argument is valid if it is of a form such that if all premises are True then the conclusion must be True, too. An argument is not valid just because it exists! Just as a proposition is a single statement that may be either True or False (but not both), an argument is a finite sequence of statements that may be either valid or invalid (but not both.)

A proof is a valid argument made up of propositions. In a proof, some premises may be axioms or postulates, which are propositions that we simply ASSUME to be True. Other premises used in a proof may be previously-proven propositions called theorems. There are many other terms used for theorems depending on the context, such as lemma (a minor theorem needed to prove a more important major theorem) and corollary (a theorem that is a conclusion based on a premise that is a more important theorem), but each of these specialized terms describes a theorem.

Key terms and concepts covered in this chapter:

Propositional inference rules (concepts of modus ponens and modus tollens)
- Notions of implication, equivalence, converse, inverse, contrapositive, negation, and contradiction
The structure of mathematical proofs
Proof techniques
- Direct proofs
- Proof by counterexample (Disproving by counterexample)
- Proof by contraposition (proof by contrapositive)
- Proof by contradiction
To be added to this chapter
- Circles of implication (“Ringschluss”) and logical equivalence

6.1. Rules of Inference for Propositions

To create a proof, you must proceed from True propositions to other True propositions without introducing False propositions into the argument. To do this, you use rules of inference, which are ways to draw a True conclusion from one or more premises that are already known to be True (or assumed to be True). That is, a rule of inference is an argument form that corresponds to a tautology, and so is a valid argument form.

Example 1 - Notation For A Rule Of Inference

A rule of inference can be represented as follows.

$p_{1}$
$p_{2}$
$\vdots$
$p_{n}$
$\therefore q$

The propositional variables $p_{1},\,p_{2},\,\ldots,\,p_{n}$ represent the premises of the argument, and $q$ represents the conclusion of the argument. The symbol $\therefore$ is read as “therefore.”

This rule of inference is interpreted to mean that if all of the propositions $p_{1},\,p_{2},\,\ldots,\,p_{n}$ are True then the proposition $q$ MUST be True. The rule of inference in this example corresponds to the tautology $(p_{1} \land p_{2} \land \ldots \land p_{n})\rightarrow q$.

Note that the conclusion $q$ must be the last proposition, but the order in which the premises $p_{1},\,p_{2},\,\ldots,\,p_{n}$ are listed in the argument does not matter since we use the conjunction of all premises to prove the conclusion. The premises $p_{1},\,p_{2},\,\ldots,\,p_{n}$ are usually presented in an order that follows the flow of thought.

Example 2 - Valid and Invalid Arguments

Consider the following three arguments.

A Valid Argument

The following argument is based on an example in the Logic chapter.

Sarah earned a B.S. in Computer Science.
If Sarah earned a B.S. in Computer Science, then Sarah earned a C or better in Discrete Math.
Therefore, Sarah earned a C or better in Discrete Math.

The argument is of the form

p
if p then q
$\therefore$ q

This argument form is named modus ponens which is translated very roughly from Latin as “method of affirming.” Modus ponens is a valid argument form because it corresponds to the tautology $(p \land (p \rightarrow q)) \rightarrow q$.

\(p\)	\(q\)	\(p \rightarrow q\)	\(p \land (p \rightarrow q)\)	\(q\)	\((p \land (p \rightarrow q)) \rightarrow q\)
T	T	T	T	T	T
T	F	F	F	F	T
F	T	T	F	T	T
F	F	T	F	F	T

$p$

$q$

$p \rightarrow q$

$p \land (p \rightarrow q)$

$q$

$(p \land (p \rightarrow q)) \rightarrow q$

That is, modus ponens is a rule of inference.

An Invalid Argument

Consider this second argument.

Arya earned a C or better in Discrete Math.
If Arya earned a B.S. in Computer Science, then Arya earned a C or better in Discrete Math.
Therefore, Arya earned a B.S. in Computer Science.

This argument is NOT valid. If you assume that the first two propositions are True, you do not have enough information to reach the conclusion that Arya earned a B.S. in Computer Science: Arya may have earned that degree, or may still be working towards the degree, or may have changed majors and earned a degree in the new major, or perhaps there is some other possibility - you cannot determine whether the conclusion is True or False based on the assumption that the two premises are True.

The argument corresponds to the argument form

q
if p then q
$\therefore$ p

This argument form is an example of a fallacy or non sequitur, which is Latin for “it does not follow.” The argument form is invalid because it corresponds to a proposition that is NOT a tautology, as can be seen in the truth table for the compound proposition $(q \land (p \rightarrow q)) \rightarrow p$.

\(p\)	\(q\)	\(p \rightarrow q\)	\(q \land (p \rightarrow q)\)	\(p\)	\((q \land (p \rightarrow q)) \rightarrow p\)
T	T	T	T	T	T
T	F	F	F	T	T
F	T	T	T	F	F
F	F	T	F	F	T

$p$

$q$

$p \rightarrow q$

$q \land (p \rightarrow q)$

$p$

$(q \land (p \rightarrow q)) \rightarrow p$

Notice that there is at least one row of the truth table in which both $q$ and $p \rightarrow q$ are True but $p$ is False! This means that you CANNOT infer that $p$ is True whenever $(q \land (p \rightarrow q))$ is True. The argument form is invalid because $(q \land (p \rightarrow q)) \rightarrow p$ is not a tautology.

This particular fallacy is used by people so often that it has its own name: The converse error, or fallacy of the converse.

A Third Argument - You Try

Write the argument form for the following argument.

Jing did not earn a B.S. in Computer Science.
If Jing earned a B.S. in Computer Science, then Jing earned a C or better in Discrete Math.
Therefore, Jing did not earn a C or better in Discrete Math.

Find a compound proposition that corresponds to the argument form you wrote, and write the truth table for that compound proposition.

Question

Is the argument valid?

Hint

The argument is valid if and only if the compound proposition is a tautology.

Answer

No. The argument is invalid and is an example of the inverse error or fallacy of the inverse.

Another way to see that this argument is invalid is to consider a case where Jing did earn a C or better in Discrete Math even though the two premises are True; for example, Jing could have earned a B.S. in Mathematics instead of Computer Science and also earned a C or better in Discrete Math.

In the following subsections we will discuss some of the more common rules of inference.

6.1.1. Transitivity Of The Conditional

The following rule of inference is called pure hypothetical syllogism, but we will use the less formal name transitivity. It is the basis of conditional proof in mathematics.

Transitivity (Pure Hypothetical Syllogism)

$p \rightarrow q$
$q \rightarrow r$
$\therefore p \rightarrow r$

This rule of inference corresponds to the tautology $((p \rightarrow q) \land (q \rightarrow r)) \rightarrow (p \rightarrow r)$.

By applying transitivity multiple times, you can build a finite chain of implications of any length you want:

$p \rightarrow p_{1}$
$p_{1} \rightarrow p_{2}$
$p_{2} \rightarrow p_{3}$
$\vdots$
$p_{k-1} \rightarrow p_{k}$
$p_{k} \rightarrow r$
$\therefore p \rightarrow r$

6.1.2. Rules Of Inference And Fallacies Arising From The Conditional

Recall that if you have propositions $p$ and $q,$ you can form the conditional with hypothesis $p$ and consequent $q,$ which is written as $p \rightarrow q$, as well as three other related conditionals.

$p \rightarrow q$, the conditional
$q \rightarrow p$, the converse of $p \rightarrow q$
$\neg q \rightarrow \neg p$, the contrapositive of $p \rightarrow q$
$\neg p \rightarrow \neg q$, the inverse of $p \rightarrow q$

Also, recall that $(p \rightarrow q) \equiv (\neg q \rightarrow \neg p)$. That is, $(p \rightarrow q) \leftrightarrow (\neg q \rightarrow \neg p)$ is a tautology. This means that the conditional is logically equivalent to its contrapositive. The conditional is NOT logically equivalent to either its converse or its inverse, as was shown using truth tables in the Logicchapter.

From the four conditionals you can get two rules of inference and two fallacies. Together, these four argument forms are referred to as the mixed hypothetical syllogisms.

First, here are the two rules of inference.

Modus Ponens (“Method Of Affirming”)

$p \rightarrow q$
$p$
$\therefore q$

This rule of inference corresponds to the tautology $((p \rightarrow q) \land p) \rightarrow q$.

Modus Tollens (“Method Of Denying”)

$p \rightarrow q$
$\neg q$
$\therefore \neg p$

This rule of inference corresponds to the tautology $((p \rightarrow q) \land \neg q) \rightarrow \neg p$.

This rule of inference corresponds to replacing the conditional $p \rightarrow q$ by its logically equivalent contrapositive $\neg q \rightarrow \neg p$ in the tautology, which gives $((\neg q \rightarrow \neg p) \land \neg q) \rightarrow \neg p,$ then applying modus ponens to this new tautology.

Next, here are the two fallacies. They are included because they are very common errors to be aware of and to avoid.

Inverse Error

$p \rightarrow q$
$\neg p$
∴¬q

This fallacy arises by mistakenly treating the inverse $\neg p \rightarrow \neg q$ as if it were logically equivalent to $p \rightarrow q$. It is also called the “fallacy of the inverse” and “fallacy of denying the hypothesis.”

Converse Error

$p \rightarrow q$
$q$
∴p

This fallacy arises by mistakenly treating the converse $q \rightarrow p$ as if it were logically equivalent to $p \rightarrow q$. It is also called the “fallacy of the converse” and the “fallacy of affirming the consequent.”

Later in this chapter you will see how these four conditionals can be viewed as describing the subset relationship between two sets, which may help you recognize when either a converse error or an inverse error is being made.

6.1.3. Other Common Rules Of Inference

Any tautology of the form $p \rightarrow q$ can be used to define a rule of inference. In particular, we can define a rule of inference corresponding to the tautology $(p_{1} \land p_{2} \land \ldots \land p_{n})\rightarrow p_{1}$ for each integer $n \geq 1$. This means that there are at least as many possible tautologies as there are natural numbers! How do we deal with infinitely many tautologies?

In general, there is a small number of rules of inference that are used in most proofs. Proofs often are built up to a large size by applying just a few rules of inference multiple times.

Here are some of the more commonly-used rules of inference. In the remix author’s opinion, it’s better to practice using these rules of inference rather than to focus on memorizing them as formal rules with special names.

Proof by Cases

$p \rightarrow r$
$q \rightarrow r$
$p \lor q$
$\therefore r$

This rule of inference corresponds to the tautology $((p \rightarrow q) \land (q \rightarrow r) \land (p \lor q)) \rightarrow r$.

Elimination (Disjunctive Syllogism)

$p \lor q$
$\neg q$
$\therefore p$

This rule of inference corresponds to the tautology $((p \lor q) \land \neg q) \rightarrow p$.

Resolution

$p \rightarrow q$
$\neg p \rightarrow r$
$\therefore q \lor r$

This rule of inference corresponds to the tautology $((p \rightarrow q) \land (\neg p \rightarrow r)) \rightarrow (q \lor r)$.

Notice that this rule of inference can also be written as
$\neg p \lor q$
$p \lor r$
$\therefore q \lor r$
This form of resolution is important to automated theorem-proving.

Contradiction Rule

$\neg p \rightarrow (q \land \neg q)$
$\therefore p$

This rule of inference corresponds to the tautology $(\neg p \rightarrow (q \land \neg q)) \rightarrow p$.

Note that this tautology is often written in the alternate form $((\neg p \rightarrow q) \land (\neg p \rightarrow \neg q)) \rightarrow p$, which can be more useful in some contexts.

There are many more rules of inference we could write down and give names to. Instead, we’ll just list a few tautologies.

You Try

For each of the tautologies shown, write the argument form for the corresponding rule of inference. If needed, refer to Example 2 in this chapter to see how the argument, argument form, and tautology are related.

$(p \land q) \rightarrow p$
$p \rightarrow (p \lor q)$
$p \rightarrow (q \rightarrow (p \land q))$

6.2. Rules Of Inference for Quantified Statements

In this section, four rules of inference that apply to quantified predicates are presented. In all of these rules of inference, the values of the variable(s) are assumed to be restricted to a universal set $U.$

6.2.1. Rules of Inference for Universally-Quantified Predicates

Universal instantiation states that, from the premise that $\forall x P(x)$ is True, where $x$ ranges over all elements of the universal set $U,$ you can conclude that $P(c)$ must also be True, where $c \in U$ is any arbitrarily-chosen element of $U.$

Universal Instantiation (Universal Specification)

$\forall x P(x)$
$\therefore P(c) \text{ for any } c \in U$

Universal generalization states that, from the premise that $P(x)$ is True for every arbitrarily-chosen value of $x$ that is an element of the universal set $U,$ you can conclude $\forall x P(x)$ must also be True, where $x$ ranges over all elements of the universal set $U.$

Universal Generalization

$P(c) \text{ for every } c \in U$
$\therefore \forall x P(x)$

6.2.2. Rules of Inference for Existentially-Quantified Predicates

Existential instantiation states that, from the premise $\exists x P(x),$ you can conclude that there must be at least one $c \in U$ such that $P(c)$ is true. This allows you to pick a “constant” $c$ that makes the predicate $P(x)$ True instead of needing to refer repeatedly to the existential quantifier.

Existential Instantiation (Existential Elimination)

$\exists x P(x)$
$\therefore P(c) \text{ for some element } c \in U$

Existential generalization states that, from the premise that $P(c)$ is True for at least one $c \in U,$ you can conclude that $\exists x P(x)$ must also be True.

Existential Generalization

$P(c) \text{ for some element } c \in U$
$\therefore \exists x P(x)$

6.2.3. An Example Using Rules Of Inference For Predicates

Example 3 - An Argument That Involves A Predicate

The following argument was used as an example in the Logic chapter.

Sarah earned a B.S. in Computer Science.
Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.
Therefore, Sarah earned a C or better in Discrete Math.

Notice that the second proposition above involves a universally-quantified predicate: The universal quantifier “Anyone” is applied to the predicate “$\rule{12mm}{.5pt}$ who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.”

In this argument, you use one of the rules of inference for quantified predicates to create a proposition about Sarah from the universally-quantified predicate as follows.

Sarah earned a B.S. in Computer Science.
Anyone who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.
Sarah who earned a B.S. in Computer Science must have earned a C or better in Discrete Math.
Therefore, Sarah must have earned a C or better in Discrete Math. That is, Sarah earned a C or better in Discrete Math.

This new argument is of the form

$P(c)$
$\forall x (P(x) \rightarrow Q(x))$
$P(c) \rightarrow Q(c)$
$\therefore Q(c)$

6.3. Explaining The Converse Error Using An Euler Diagram

The following image uses an Euler diagram of two sets $A \subseteq B$ to explain why the converse error is a fallacy.

To see why the converse error is a fallacy, consider the following question: If you are told that $c$ is an element of set $B$ in the preceding image, can you determine whether $c$ is an element of set $A$?

You Try

Describe how the Euler diagram can be used to explain why the inverse error is a fallacy.

6.4. Proof Techniques

In this section several examples of formal mathematical proofs are given to illustrate different proof techniques. Many of the techniques correspond to certain rules of inference that were discussed earlier in this chapter.

Each proof starts by stating a conjecture, which is a proposition with undetermined truth value. The goal of each proof is to determine the truth value of the relevant conjecture.

To simplify the description of the proof techniques, we’ll only consider the case where the proof has a single premise $p$, that is, we’ll always assume that our proof involves a single conditional $p \rightarrow q$. This may seem like an oversimplification, but it is not: We are simply renaming the conjunction $(p_{1} \land p_{2} \land \ldots \land p_{n})$ of all of the actual premises by using the single propositional variable $p$, that is we are defining $p$ by the logical equivalence $p \equiv (p_{1} \land p_{2} \land \ldots \land p_{n})$

Here we’ll present examples of proofs using several different techniques. Most of these proofs establish an arithmetic fact that you probably have always known (or assumed) is True; instead, you can focus on the form of the proof: Note the steps that are used, and how the argument flows.

Another important proof technique, mathematical induction, will be discussed in a later chapter.

Example 4 - Why do we need proofs at all?

Consider the following proposition: “For all natural numbers $n,$ the value of $n^{2} + n + 41$ is a prime integer.”

For each of the natural numbers 0, 1, …, 10, the predicate $P(n)$: “$n^{2} + n + 41$ is a prime integer.” evaluates to a proposition that is True. In fact, $P(n)$ evaluates to a proposition that is True for each of the natural numbers $n$ that is less than 40.

It may seem that we have “checked enough cases” to conclude that $P(n)$ will evaluate to a proposition that is True for every possible natural number value of $n.$ However, $P(40)$ is the proposition “$40^{2} + 40 + 41 \text{ is a prime integer}$,” which is False - notice that $40^{2} + 40 + 41 = 40 \cdot (40 + 1)+ 41 = 40 \cdot 41 + 41 = 41^2$ is a composite number.

This example shows that it is not enough to verify that a proposition or predicate is True for just a few cases, unless those cases happen to cover every possibility.

6.4.1. Direct Proof

In a direct proof, we make an argument that a conditional statement $p \rightarrow q$ must be True. This means that we can assume that the premise $p$ is True and apply modus ponens to prove that $q$ must be True, too.

Theorem

If $a,$ $b,$ and $c$ are integers such that both $a$ and $b$ are divisible by $c,$ then for any integers $m$ and $n$ the integer $ma+nb$ must be divisible by $c.$
This statement can be rephrased as “If the integers $a$ and $b$ are multiples of the integer $c$, then any sum of integer multiples of $a$ and $b$ must also be a multiple of $c.”$

Proof

Before starting the formal proof, let’s look at a specific example. The integers 10, 6, and 2 are such that both 10 and 6 are divisible by 2. Suppose we have a pair of integers that we will use as multipliers, say 11 and 7, to form a new number $(11)(10)+(7)(6).$ Is the new number also divisible by 2? The answer is obviously "Yes" since we can just simplify the sum and divide by 2, but how can we justify this for every choice of the pair of multipliers? Notice that we can factor 2 out of 10 and 6 in the sum that defines our new number: \[ (11)(10) + (7)(6) = (11)(5)(2) + (7)(3)(2) = [(11)(5)+(7)(3)](2) \] We can find the common factor of 2 and "factor it out" as if it were a variable. This appears to work no matter what values are chosen for the first three integers and the pair of multipliers, so should be generalizable to a formal proof.

For the formal proof, start by supposing that $a,$ $b,$ and $c$ are integers such that both $a$ and $b$ are divisible by $c.$ This means that there are integers $q$ and $t$ such that \[ a=qc \text{ and } b=tc \]

For any integers $m$ and $n$, we can rewrite the expression $ma+nb$ as \[ ma+nb = m(qc)+n(tc) = (mq)c + (nt)c = (mq+nt)c \]

The last part of the extended equality shows that $ma+nb$ is a multiple of $c,$ that is, $ma+nb$ is divisible by $c.$

This shows that the statement of the theorem is a True proposition.

Q.E.D.

You Try

Write a proof of the following statement: If $a,$ $b,$ and $c$ are integers such that $a$ is divisible by $b$ and $b$ is divisible by $c,$ then $a$ must be divisible by $c.$

6.4.2. Proof By Contraposition

In a proof by contraposition, we make an argument that $p \rightarrow q$ is True by instead first arguing that its contrapositive $\neg q \rightarrow \neg p$ is True and secondly applying the logical equivalence of the conditional and its contrapositive. Start by assuming that the premise $\neg q$ is True and apply modus ponens to prove that $\neg p$ must be True, too, then apply logical equivalence.

Theorem

Let $n$ represent an integer.

If $n^{2}$ is even, then the integer $n$ is even.

Proof

It is easier to prove "if it’s not the case that the integer $n$ is even, then it’s not the case that $n^{2}$ is even," so start by supposing that it’s not the case that the integer $n$ is even.

This means that $n$ must be odd, so there is an integer $q$ such that \[ n = 2q + 1\]

This in turn means that \[ n^{2} = (2q + 1)^{2} = 4q^{2} + 4q + 1 = 2(2q^{2} + 2q) + 1\]

The last part of the extended equality shows that $n^{2}$ is odd: When $n^{2}$ is divided by 2, the remainder is 1. That is, $n^2$ is not even.

This shows that the contrapositive of the statement of the theorem, "if the integer $n$ is not even, then $n^{2}$ is not even" is a True proposition. Since every conditional and its contrapositive are logically equivalent, this argument proves that "If $n^{2}$ is even, then the integer $n$ is even" is a True proposition.

Q.E.D.

You try

Prove the following statement: If $n^{2}$ is odd, then the integer $n$ is odd.

6.4.3. Proof By Counterexample

In a proof by counterexample, we disprove a proposition of the form $(\forall x \in D) P(x)$ by arguing that there is at least one value $c \in D$ such that $\neg P(c)$ is True.

Conjecture

For every natural number $n,$ there are natural numbers $a,$ $b,$ and $c$ such that $n = a^{2} + b^{2} + c^{2}.$

Disproof of Conjecture

In this case, we can simply compute values of the expression $a^{2} + b^{2} + c^{2}$ until we find a "gap." \[ 0^{2} + 0^{2} + 0^{2} = 0 \] \[ 1^{2} + 0^{2} + 0^{2} = 1 \] \[ 1^{2} + 1^{2} + 0^{2} = 2 \] \[ 1^{2} + 1^{2} + 1^{2} = 3 \] \[ 2^{2} + 0^{2} + 0^{2} = 4 \] \[ 2^{2} + 1^{2} + 0^{2} = 5 \] \[ 2^{2} + 1^{2} + 1^{2} = 6 \] \[ 2^{2} + 2^{2} + 0^{2} = 8 \] \[ 2^{2} + 2^{2} + 1^{2} = 9 \] \[ 3^{2} + 0^{2} + 0^{2} = 9 \] Notice that you cannot write 7 as a sum of three squares of natural numbers (There may be other numbers that cannot be written in this form, too, but 7 is the least such number and we only need to find one counterexample.)

This proves the negation of the Claim, namely

Theorem

There exists at least one natural number $n,$ such that for all natural numbers $a,$ $b,$ and $c,$ $n \neq a^{2} + b^{2} + c^{2}.$

This may seem like a strange conjecture to consider until you find out that every natural number $n$ can be written as $n = a^{2} + b^{2} + c^{2} + d^{2}$ for some natural numbers $a,$ $b,$ $c,$ and $d,$ and that this may have been known about 1,800 years ago.

6.4.4. Proof by Contradiction

In a proof by contradiction, we disprove the proposition $\neg p$ by making an argument that the conditional $\neg p \rightarrow (q \land \neg q)$ must be True for some proposition $q$ and apply the Contradiction Rule to conclude that $p$ must be True.
Note: Sometimes, we argue instead that the proposition $((\neg p \rightarrow q) \land (\neg p \rightarrow \neg q))$ must be True, and use the fact that this proposition is logically equivalent to $\neg p \rightarrow (q \land \neg q)$ and apply the Contradiction Rule.

Theorem

There are no positive integers $a$ and $b$ such that $\displaystyle{ \left( \frac{a}{b} \right)^{2} } = 2.$

Proof

It may be helpful to write the proposition out in symbols: \[ \neg (\exists a \in \mathbb{Z}) (\exists b \in \mathbb{Z}) \left( a > 0 \land b>0 \land \left( \frac{a}{b} \right)^{2} = 2 \right) \]

To prove this proposition by contradiction, we ASSUME that its negation is True, that is we use the premise \[ \text{ Premise: } \neg \neg (\exists a \in \mathbb{Z}) (\exists b \in \mathbb{Z}) \left( a > 0 \land b>0 \land \left( \frac{a}{b} \right)^{2} = 2 \right) \] In words, we assume: There are integers $a$ and $b$ such that $a > 0,$ $b > 0,$ and $\displaystyle{ \left( \frac{a}{b} \right)^{2} } = 2.$

We know that we can reduce the fraction so that $a$ and $b$ have no common prime factors (You know how to do this - just divide both numerator and denominator by their greatest common divisor. In fact, we will prove that this can be done in the mathematical induction chapter later in the textbook, but for now just treat it like a "known fact".)

To eliminate the fraction we can rewrite the equation as \[ a^{2} = 2 b^{2} \]

From this new equation, $a^{2}$ must be divisible by 2. Use the theorem we proved earlier in this section to conclude that $a$ must be divisible by 2, too. This means that there is an integer $q$ such that $a = 2q.$ Substitute the last expression for $a$ in the equation to get \[ (2q)^{2} = 2 b^{2} \] which we can rewrite as \[ 4 q^{2} = 2 b^{2} \] or \[ 2 q^{2} = b^{2} \] From this equation, $b^{2}$ is divisible by 2, and we can conclude that $b$ is divisible by 2, too.

So… we have positive integers $a$ and $b$ that have no common prime factors, and we have proven that $a$ and $b$ have a common prime factor, namely 2. We have arrived at a contradiction.

Apply the Contradiction Rule to infer that the premise must be False.

We have proven the following theorem.

Theorem

There are no positive integers $a$ and $b$ such that $\displaystyle{ \left( \frac{a}{b} \right)^{2} } = 2.$

That is, the square root of 2 is not a rational number.

Here is another example of proof by contradiction.

Theorem - Generalized Pigeonhole Principle

Suppose that $n$ and $k$ are positive integers. If each of $n$ objects is assigned to one of $k$ categories, then at least one category contains at least $\displaystyle{\left\lceil \frac{n}{k} \right\rceil}$ objects.

Proof

First, recall that $\lceil x \rceil$ is the ceiling function whose output is the least integer that is greater than or equal to $x$ (that is, $\lceil x \rceil$ rounds a real number $x$ up to the next greatest integer.) The graph of the function is shown in section 3 of this appendix.

Next, before starting a formal proof, let’s look at a specific example to understand what we need to prove. Suppose we want to assign 13 people to the 3 categories "high school student," "post-secondary student," and "other". It’s not hard to see that when we assign people to the categories, the sum of the numbers in the categories has to be 13, so at least one of the categories has at least $5 = \displaystyle{\left\lceil \frac{13}{3} \right\rceil}$ people. That is, if each category had at most $4 = \displaystyle{\left\lceil \frac{13}{3} \right\rceil - 1}$ people, then the sum of those numbers would be at most $(3)(4) = 12,$ but we know that the sum should be equal to 13 - this is a contradiction. You can formalize this argument to use $n$ and $k$ instead of the specific values 13 and 3, which will provide a formal proof using the "proof by contradiction" technique.

Now, to prove this proposition by contradiction, suppose that the conditional is False. That is, we assume that

It is True that $n$ and $k$ are positive integers.
It is true that each of $n$ objects has been assigned to one of $k$ categories.
It is False that "At least one category contains at least $\displaystyle{\left\lceil \frac{n}{k} \right\rceil}$ objects."

So we are assuming that the negation of the last bulleted statement must be True, that is "Every category contains fewer than $\displaystyle{\left\lceil \frac{n}{k} \right\rceil}$ objects" is True. You should verify that this is the correct negation by using De Morgan’s laws for quantifiers.

Label the categories with the integers $1, 2, \ldots k$ and let the integers $c_1, c_2, \ldots, c_k$ be the counts of objects in each of the categories, that is, assume that the number of objects assigned to category $i$ is $c_i.$ From the assumption, every $c_i$ is less than or equal to $\displaystyle{\left\lceil \frac{n}{k} \right\rceil} - 1.$ The total number of objects assigned to categories is therefore \[ c_1 + c_2 + \cdots + c_k \leq k \left( \displaystyle{\left\lceil \frac{n}{k} \right\rceil} - 1 \right) \]

For any real number $x,$ it is true by definition of the ceiling function that $\displaystyle{x \leq \left\lceil x \right\rceil < x+1}$

This means that \[ c_1 + c_2 + \cdots + c_k \leq k \left( \displaystyle{\left\lceil \frac{n}{k} \right\rceil} - 1 \right) < k \left( \displaystyle{\left( \frac{n}{k} + 1 \right)} - 1 \right) \] and the expression on the right simplifies to $n.$

So the number of objects assigned to the categories must be strictly less than $n,$ but we also have as a premise that all $n$ objects were assigned. This is a contradiction.

Apply the Contradiction Rule to infer that it is False that the last bulleted statement is False, that is, conclude that the conditional statement of the theorem must be True.

We have proven the theorem.

Theorem

Suppose that $n$ and $k$ are positive integers.

If each of $n$ objects is assigned to one of $k$ categories, then at least one category contains at least $\displaystyle{\left\lceil \frac{n}{k} \right\rceil}$ objects.

6.4.5. Proof By Exhaustion (Proof By Cases)

Sometimes it is convenient to break a proof into a finite number of cases. For example, it may be easier to prove a statement that involves an integer by considering a first case where the integer is odd and a separate second case where the integer is even, then combining the two separate cases to create a single proof for all integers $n.$

In a general proof by cases, you make an argument that a conditional statement $(p_{1} \lor \cdots \lor p_{n}) \rightarrow r$ must be True. This means that if any one of the "cases" $p_{i},$ where $i \in \{ 1, 2, \ldots , n \},$ is True, you can apply the tautology $p_{i} \rightarrow (p_{1} \lor \cdots \lor p_{n})$ and the transitivity rule of inference to prove that $r$ must be True, too.

A proof by exhaustion is a special kind of proof by cases where the premise is of the form $(p_{1} \lor \cdots \lor p_{n} \lor \neg (p_{1} \lor \cdots \lor p_{n}) ).$ Notice that this premise must be True since if all of $p_{1},$ $p_{2},$… $p_{n}$ are False then $\neg (p_{1} \lor \cdots \lor p_{n})$ is True.

If there are two cases, proof by exhaustion corresponds to using the "proof by cases" rule of inference discussed above in the section "Other Common Rules Of Inference." The tautology can be rewritten in the simpler form $((p \rightarrow r) \land (\neg p \rightarrow r)) \rightarrow r$ because $p \lor \neg p$ must always be True.

If there are more than two cases, this corresponds to using the tautology $((p_{1} \rightarrow r) \land ... \land (p_{n} \rightarrow r) \land (\neg (p_{1} \lor \cdots \lor p_{n}) \rightarrow r)) \rightarrow r$.

Example 5 - Working with multiple cases

Let’s prove the following theorem: \[ \text{If $n$ is an integer, then $n(n+1)$ is an even integer.} \]

Proof

Consider the following two cases:

Case 1: $n$ is an odd integer.
In this case, $n+1$ must be an even integer, and $n(n+1)$ is the product of an odd integer and an even integer so must be even. (Note that this could be made more formal by stating that there is some integer $j$ such that $n+1 = 2j$ so that $n(n+1) = n(2j) = (2j)n = 2(jn).$)
Case 2: $n$ is an even integer.
In this case, $n+1$ must be an odd integer, and $n(n+1)$ is the product of an even integer and an odd integer so must be even. (Note again that this could be made more formal by stating that there is some integer $k$ such that $n = 2k$ so that $n(n+1) = (2k)(n+1) = 2((k)(n+1)).$)

Since the statement "$n$ is an odd integer or $n$ is an even integer" must be True no matter what value the integer $n$ has, this shows that the statement of the theorem is a True proposition.

Q.E.D.

7. Sequences and Recursion

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on October 12, 2025.
Edited definition of recursively-defined function.
Added material to the section about the Towers Of Hanoi.

Sequences are functions with domain a nonempty subset of the natural numbers. That is, sequences are ordered lists of objects indexed by some or all of the natural numbers. The indexed objects in the list are called the terms of the sequence and may be any kind of object - numbers, sets, functions, strings, steps of a proof, steps of an algorithm, etc.

Recursion is a process that you can use to define an object, compute a value, or describe the construction of an object or set of objects, by using a sequence of steps where each step after the initial step refers to one or more previously completed steps.

Key terms and concepts covered in this chapter:

Sequences
Recursive mathematical definitions
- Recursive definitions of sequences and functions
  - Factorials
  - Arithmetic and geometric progressions
  - The Fibonacci sequence (also called the Fibonacci numbers)
  - Other sequences and functions
- Recursive definitions of sets of objects (e.g., rooted trees, valid Java identifiers)
- The "Towers of Hanoi" game
Recurrence relations
- Solving recurrence relations

7.1. Sequences

A sequence is a function $s$ from a nonempty subset of the natural numbers $\mathbb{N} = \{ 0, 1, 2, 3 \ldots \}$ to a set $C.$ That is, the domain of the sequence is some set of nonnegative integers, and the codomain can be any set. Each "input" value of $n$ in the domain is called an index. The "outputs" of the sequence are called the terms of the sequence, and are usually denoted by $s_{n},$ which is usually used instead of the function notation $s(n),$ but the meaning is the same: The output value that corresponds to the input $n.$

7.1.1. Sequences of Numbers

Two common ways to describe or define a sequence of numbers are

a single formula, called a closed form for the sequence, that can be used to compute a term from the value of the index $n,$ or
a recursive rule that includes
- stating the values of the first few terms of the sequence, called the initial value(s) of the sequence, and
- a recurrence relation that describes how the $n$th term of the sequence $a_{n}$ can be computed using one or more terms that have index less than $n.$

In this subsection, several examples of sequences of numbers are presented. You may have seen some of these sequences in your previous mathematics experience but others may be new to you.

An arithmetic sequence is a sequence of numbers generated by a linear expression.

Example 1 - Arithmetic Sequences

Consider the sequence $a_n=3n+1$ defined for all natural numbers $n.$ The first 5 terms of this sequence are shown below.

$a_0=\ 3\left(0\right)+1=1$

$a_1=\ 3\left(1\right)+1=4$

$a_2=3\left(2\right)+1=7$

$a_3=3\left(3\right)+1=10$

$a_4=3\left(4\right)+1=13$

Notice that this sequence can be defined recursively using a recurrence relation : \[a_0=1 \text{ and } a_n=a_{n-1}+3 \text{ for } n \in \mathbb{N}_{>0}.\] So

$a_0=1$

$a_1=a_{1-1}+3=a_0+3=1+3=4$

$a_2=a_{2-1}+3=a_1+3=4+3=7$

$a_3=a_{3-1}+3=a_2+3=7+3=10$

$a_4=a_{4-1}+3=a_3+3=10+3=13$

Notice that an arithmetic sequence is determined by an initial term and a common difference. The arithmetic sequence $a_n=3n+1$ is the sequence with initial term $a_0=1$ and common difference $d=3$. The general arithmetic sequence is $a_n=c + d\cdot n$ with initial term $c$ and common difference $d.$

A geometric sequence is a sequence of numbers generated by an exponential expression.

Example 2 - Geometric Sequences

Consider the sequence $b_n=2\cdot\ 3^n$ defined for all natural numbers $n.$ The first 5 terms of this sequence are shown below.

$b_0=2\cdot\ 3^0=2$

$b_1=2\cdot\ 3^1=6$

$b_2=2\cdot\ 3^2=18$

$b_3=2\cdot\ 3^3=54$

$b_4=2\cdot\ 3^4=162$

Notice that this sequence can be defined recursively using a recurrence relation : \[b_0=2 \text{ and } b_n=3\cdot b_{n-1} \text{ for } n \in \mathbb{N}_{>0}.\] So

$b_0=2$

$b_1=3\cdot b_{1-1}=3\cdot2=6$

$b_2=3\cdot b_{2-1}=3\cdot6=18$

$b_3=3\cdot b_{3-1}=3\cdot18=54$

$b_4=3\cdot b_{3-1}=3\cdot54=162$

Notice that a geometric sequence is determined by an initial term and a common ratio. The geometric sequence $b_n=2\cdot\ 3^n$ is the sequence with initial term $b_0=2$ and common ratio $r=3$. The general geometric sequence is $b_n=c \cdot r^n$ with initial term $c$ and common ratio $r.$

The factorial is usually defined using a recurrence relation, but with its own notation that does not use a subscript for the index.

Example 3 - The Factorial Function

The factorial of a natural number can be treated as a term of a sequence, called the factorial function. The commonly-used notation for a term of this sequence is $n!$ and the sequence is defined as \[n! = n \cdot (n-1) \cdots 2 \cdot 1\] with $1! = 1$ and $0! = 1.$

Notice that the factorial can be defined a bit more precisely by using the recurrence relation \[0!=1 \text{ and } n!=n\cdot (n-1)! \text{ for } n \in \mathbb{N}_{>0}.\] So the first few values of the factorial are

$0!=1$

$1!=1\cdot (1-1)!=1\cdot1=1$

$2!=2\cdot (2-1)!=2\cdot1=2$

$3!=3\cdot (3-1)!=3\cdot2=6$

$4!=4\cdot (4-1)!=4\cdot6=24$

The next sequence is the very famous Fibonacci numbers, named for the Italian mathematician Fibonacci who was also known as Leonardo of Pisa. However, this sequence and its properties were known and discussed by Indian poets and mathematicians such as Pingala, Virahanka, and Hemachandra before Fibonacci was born. The sequence became known to Europeans when Fibonacci used the sequence in his 1202 book Liber Abaci to solve a counting problem that involves breeding pairs of rabbits
The book Liber Abaci ("Book of Calculation") is credited with popularizing the use of the base-ten Hindu-Arabic numeral system in Europe, too.

Example 4 - The Fibonacci Numbers

The Fibonacci sequence can be defined recursively as \[f_{0}=0, \, f_{1}=1, \text{ and } f_{n} = f_{n-1} + f_{n-2} \text{ for } n \geq 2.\]

$f_{2} = f_{2-1} + f_{2-2} = f_{1} + f_{0} = 1+0 = 1$

$f_{3} = f_{3-1} + f_{3-2} = f_{2} + f_{1} = 1+1 = 2$

$f_{4} = f_{4-1} + f_{4-2} = f_{3} + f_{2} = 2+1 = 3$

$f_{5} = f_{5-1} + f_{5-2} = f_{4} + f_{3} = 3+2 = 5$

$f_{6} = f_{6-1} + f_{6-2} = f_{5} + f_{4} = 5+3 = 8$

$f_{7} = f_{7-1} + f_{7-2} = f_{6} + f_{5} = 8+5 = 13$

Note: The definition used in this textbook matches the ones used in several textbooks, but be warned that other may use definitions that are slightly different (e.g., some sources state the initial values as $f_{0}=1$ and $f_{1}=1.$

7.1.2. Non-numerical Sequences

As mentioned above, the terms of a sequence can be any object. Here are some examples.

Example 5 - A sequence of functions

Consider the sequence $p_{n}(x)$ of functions that are defined for real number inputs $x.$

$p_{0}(x) = 1,$ that is, the constant function 1,

$p_{1}(x) = x,$

$p_{2}(x) = x^{2},$

$p_{3}(x) = x^{3},$

and in general,

$p_{n}(x) = x^{n}.$

This is the sequence of $n$th power functions. The subscript of each of the functions matches the power that the input, $x,$ will be raised to.

Notice that we can define the sequence recursively by \[p_{0}(x)=1, \, f_{1}=1, \text{ and } p_{n}(x) = x \cdot p_{n-1}(x) \text{ for } n \geq 1.\]

The ordered list of steps used in an algorithm is a sequence.

Example 6 - An algorithm for long division

Task: Given two positive integers a and b, compute the quotient q and remainder r so that
$a = q \cdot b + r$ and $0 \leq r < b.$
Input: Two positive integers a and b
Steps:
1. Get the input values a and b.
2. Set r equal to a and set q equal to 0.
3. If r is less than b, skip to Step 5.
4. Set r equal to r - b and add 1 to q
5. If r is greater than or equal to b, then repeat Step 3
6. Return the output values q and r, and stop.
Output: Integers q and r such that both $a = q \cdot b + r$ and $0 \leq r < b.$
- q is the quotient, that is, the number of times Step 3 was executed.
- r is the remainder, that is, the result of the last execution of Step 3 (or Step 1 in cases where Step 3 is never executed.)

7.2. Recursion

A recursive definition of a class of objects consists of two steps.

Basis Step	Specify the foundational (usually, the simplest) objects in the class of objects.
Recursion	Describe how to build new objects from one or more already-constructed objects in the class of objects.

7.2.1. Recursively-Defined Structures

For some mathematical objects, it is easier to describe the construction of the objects using a recursive definition.

You may recall that well-formed formulae were defined in the Logic chapter using a recursive definition. We formalize that definition here.

Example 7 - Well-Formed Formulae

The set of well-formed formulas is defined recursively as follows:

Basis Step

A propositional variable is a well-formed formula.

Recursion

We can construct new well-formed formulae from already-constructed well-formed formulae as follows. Suppose that $\alpha$ and $\beta$ are already-constructed well-formed formulae. We can construct the following new well-formed formulae:

$\left( \neg \alpha \right)$
$\left( \alpha \land \beta \right)$
$\left( \alpha \lor \beta \right)$
$\left( \alpha \rightarrow \beta \right)$
$\left( \alpha \leftrightarrow \beta \right)$

In the next example, we describe how to construct rooted trees, a type of graph.

Example 8 - Rooted Trees

A rooted tree is a type of graph. Graphs are described informally in the Introducing Discrete Mathematics chapter.

The set of rooted trees, is defined recursively as follows.

Basis Step

A single vertex r is a rooted tree. The vertex r is called the root node of this rooted tree.

Recursion

You can construct a new rooted tree from already-constructed rooted trees as follows. Suppose you have a nonempty finite set of “old” rooted trees (that is, already-constructed rooted trees) such that
(1) no vertex is in more than one of the old rooted trees and
(2) no edge has endpoints in two different old rooted trees.
To construct the new rooted tree, first create a new vertex $r$ that is not a vertex of any of the old rooted trees, then create new edges, with one new edge from the new root node $r$ to each old root node of one of the old rooted trees. The root node of the new rooted tree is the new vertex $r.$

The preceding image shows the basis step and represents, in part, the results of the first and second uses of the recursion step. In the image, the new root nodes created at each recursion step appear at the top of the newly-created rooted trees. In each rooted tree, edges are treated as if they are directed “down,” away from the new root node; this will be discussed in more detail in the Trees chapter.

Notice that infinitely-many rooted trees are constructed at each use of the recursion step, so we cannot show all the rooted trees produced at any step other than the basis step. Also, we would need to complete infinitely many steps to construct all possible rooted trees, but any one particular rooted tree you want to construct will be produced after only finitely many uses of the recursion step.

7.2.2. Recursively-Defined Functions

A recursively defined function has two parts:

Basis Step: Specify the value of the function at one or more small input values.
Recursion Step: Give a rule for computing the function’s output value at an integer based on the output values at one or more smaller integers.

A recursive definition of a function is similar to a recurrence relation, but uses function notation.

Example 9

Consider again the Fibonacci numbers, but this time given by a function $f(n)$ where $f(0)=0$, $f(1)=1$ and $f(n)=f(n-1)+f(n-2)$ for integers $n \geq 2.$

Applying the formula gives \begin{align*} f(2)&=f(1)+f(0)=1+0=1\\ f(3)&=f(2)+f(1)=1+1=2\\ f(4)&=f(3)+f(2)=2+1=3\\ f(5)&=f(4)+f(3)=3+2=5\\ f(6)&=f(5)+f(4)=5+3=8\\ \end{align*} Thinking of this as a recurrence relation we would write $f_0=0, f_1=1$ and $f_n=f_{n-1}+f_{n-2}$. Generating the sequence ${0,1,1,2,3,5,8,\ldots}$.

7.3. Solving Recurrence Relations

Recall from earlier in this chapter that a recurrence relation is used to recursively define a sequence of numbers, based on one or more initial conditions, that is, the value(s) of the lowest-indexed term(s).

The phrase "solving a recurrence relation" means finding a closed form that defines the same sequence as the recurrence relation.

Example 10

Solve the recurrence relation $a_n=a_{n-1}+3$ when $a_1=2$.

Solution:

We are looking for a closed formula, so we will successively apply the recurrence relation until we see a pattern. \begin{align*} a_2&=a_1+3=2+3\\ a_3&=a_2+3=(2+3)+3 =2+3\cdot 2\\ a_4&=a_3+3=(2+2\cdot 3)+3=2+3\cdot 3\\ \vdots\\ a_n&=a_{n-1}+3=(2+3(n-2))+3=2+3(n-1)\\ \end{align*} So our closed formula is $a_n=2+3(n-1)$.

You Try

Solve the recurrence relation $b_n=3b_{n-1}$ when $a_1=5$.

There are techniques used to solve certain classes of recurrence relations. For now, we will focus on only one case, the class of second-order linear homogeneous recurrence relations.

Example 11 - Solving a second-order linear homogeneous recurrence relation

Consider the recurrence relation $a_n= b \cdot a_{n-1} + c \cdot a_{n-2}$ where $b$ and $c$ are constants and the initial values $a_0$ and $a_1$ will be ignored for now.

Notice that you can find at least one solution of the form $a_n = r^{n}$ for a "suitable" nonzero value of $r.$ A "suitable" value of $r$ can be found by stating that you want the following equation to be True for all natural numbers $n \geq 2$ (and showing that such a "suitable" value of $r$ actually exists!) \[ r^{n} = b \cdot r^{n-1} + c \cdot r^{n-2} \]

This means that $r^{2} = b \cdot r^{1} + c \cdot r^{0}$ or more simply \[r^{2} = b \cdot r + c\] and you can solve for $r$ by factoring or using the quadratic formula.

Notice that if the quadratic equation has two different solutions, then either one of those values can be used as the value of $r,$ so you’ve actually found two solutions. In fact, you have found all that you need to find every solution, as described in the specific example below.

As an example, consider the recurrence relation $a_n= 5 \cdot a_{n-1} - 6 \cdot a_{n-2}.$ Based on the previous argument, $a_n = r^{n}$ is a solution as long as $r^{2} = 5r - 6,$ that is, $r^{2} - 5r + 6 = 0.$ The quadratic equation has two solutions: $r = 2$ ands $r = 3,$ so each of the closed forms $a_n = 2^{n}$ and $a_n = 3^{n}$ describes a solution (but notice that the initial values for $a_1$ are different.) It is not too difficult to see that any constant multiple of either of the two solutions will give another solution; for example, $a_n = (-7) \cdot 2^{n}$ and $a_n = 5 \cdot 3^{n}$ are two more solutions. Also, a sum of any two solutions will still be a solution, so $a_n = (-7) \cdot 2^{n} + 5 \cdot 3^{n}$ is yet another solution. In fact, we will be able to prove in the chapter on mathematical induction that every solution of this recurrence relation is of the general closed form $a_n = \alpha \cdot 2^{n} + \beta \cdot 3^{n}$ where $\alpha$ ("alpha") and $\beta$ ("beta") are constants, which can be adjusted to match any initial values $a_0$ and $a_1$ you want to use: Notice that after substituting 0 for $n$ you get $a_0 = \alpha + \beta,$ and after substituting 1 for $n$ you get $a_1 = 2 \alpha + 3 \beta,$ so you need to solve a system of two linear equations in two unknowns to determine the values of $\alpha$ and $\beta.$

Note: In the case where the quadratic has only one solution $r$ (that is, $r$ is a "double root"), the general closed form solution is $a_n = (\alpha \cdot n + \beta) \cdot r^{n}$ where $\alpha$ and $\beta$ are constants.

You Try

First, find the general closed form solution for the recurrence relation $b_n=-b_{n-1}+20b_{n-2}.$ Next, find the constants $\alpha$ and $\beta$ if the initial conditions $a_0 =1$ and $a_1=2$ must also be satisfied.

7.4. Towers Of Hanoi

The Towers of Hanoi is a game that was introduced and sold by the French mathematician Édouard Lucas in the 1880s. Lucas stated that the game is "professedly of Indo-Chinese origin" but it seems that Lucas invented this story to market the game.
Image credit: "PSM V26 D464 The tower of hanoi.jpg". This work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer.

At the start of the game, a set of disks of different radii are stacked on a single peg to form a "tower." The disks are stacked so that the radii of the disks decrease as you move up the stack. There are also two empty pegs. The game is won by moving the stack of disks from the original peg to another peg using the following rules

Only one disk can be moved at a time.
The disk at the top of a stack can be moved to the top of another stack or on to an empty peg.
A disk can never be placed on top of a disk that has a smaller radius.

The Towers of Hanoi can be used to explore recursive algorithms, complexity of algorithms, and recurrence relations, based on the following questions.

What is the minimal number of moves needed to win the game when there is 1 disk? 2 disks? 3 disks? $n$ disks?
What relationship (if any) is there between the minimal number of moves needed to win an $n$-disk game and the minimal number of moves needed to win an $(n-1)$-disk game?

MORE TO COME!

8. Functions

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on March 11, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Informally, a function $f$ from set D to set C is a rule that assigns to each input element in D exactly one output element from C. The set D is called the domain of the function, and the set C is called the codomain of the function. This informal definition was given in the chapter Introducing Discrete Mathematics.

The informal definition implies that every element in the domain D is an "input" that is assigned an output value in the codomain C. The informal definition does not imply that every element in the codomain C is an output for some input in the domain D. The highlighted sentence may seem unimportant since you usually only care about the outputs you can actually get from a function, but the example presented in the next section shows why it is important to be precise about what set the codomain is. A formal definition of function is introduced in this chapter to address this need for precision.

Key terms and concepts covered in this chapter:

Functions
- Domain
- Codomain
- Range
Properties of functions (injectivity, surjectivity, bijectivity)
- A bijective function is the same as a one-to-one correspondence
- An injective function is one that assigns every pair of different inputs to a pair of different outputs
- A surjective function is one whose range is equal to its codomain (that is, every element of the codomain is an output assigned to one or more inputs)
- A bijective function is both injective and surjective
Inverse functions
Composition of functions

8.1. Why Specifying the Codomain is Important

The following example compares two implementations, in different programming languages, of the "same" function: The input values are the same, and the rule that assigns to each input its one and only output is the same. However, the two implementations use different codomains which effects how the output values can be used.

Example 1 - The floor() functions in Python and Java

In computing, integer data types are used to represent loop counters or indices into arrays, while floating-point data types are used to represent real numbers (like decimals) in scientific or financial calculations. In general, a number that can be stored using an integer type can also be stored using a floating-point type, but the ways in which that number can be used will depend on the data type. The following code examples show why it is important to keep this in mind when coding.

First, recall that the floor of x, written as $\lfloor x \rfloor,$ is the greatest integer less than or equal to the real number x. The floor() function is available in both Python and Java as an implementation of $\lfloor x \rfloor.$ In both programming languages floor() takes a double precision floating-point number as its input, but Python floor() returns a value of integer data type while Java floor() function returns a value of floating-point data type.

More detail about floating point

On most hardware, both Python’s float data type and Java’s double data type are implementations of IEEE 754 double precision 64-bit floating-point numbers. For example, the decimal number 1.4 is encoded by the bitstring \[001111111111011001100110011001100110011001100110011001100110011\] of length 64 whether you use a Python float with value 1.4 or a Java double with value 1.4. The underlying bitstring used for the encoding 1.4 is the same in both languages.
The floor() function in Python takes a float as input and returns an int
The floor() function in Java takes a double as input and returns a double.
This means that the floor() functions in Python and Java have the same domain and use, essentially, the same rule to compute output values, but have different codomains (that are represented by different data types in the two languages.)

Notice that in the Python code below, the return value from the floor() function is an int which we can then use as an index into list L.

To step through the code, click on the "Next" button.

Edit in PythonTutor

Now notice that in the Java code below, that the return value from the floor() function is a double which we cannot use as an index into array L without error. We must use a composition of functions (Java’s floor() function followed by the function that casts an input of type double to an output of type int) to get the correct data type for an array index.

To step through the code, click on the "Next" button.

Edit in PythonTutor

The formal definition given in the next section will let us distinguish between two functions that use the same rule and have the same domain but have different codomains.

8.2. A Formal Definition Of Function

Definition

A function $f$ from set $A$ to set $B$ is an ordered triple $(f,\, A,\, B)$ consisting of sets $f,$ $A,$ and $B$ such that

$f$ is a subset of the Cartesian product $A \times B$ and
each element of $A$ appears as the first coordinate of exactly one pair $( a, \, b) \in f.$

That is, $f \subseteq A \times B$ and for each element $a \in A$ there is exactly one $b \in B$ such that $(a,\, b) \in f$. The set $f$ of ordered pairs is called the graph of the function. The set $A$ is called the domain of the function and the set $B$ is called the codomain of the function.
Note: This definition of a function as an ordered triple is based on the Bourbaki definition in the 1970 book Théorie des ensembles.

Why would we need such a highly technical formal definition? The reason why the ordered triple is used in the definition is that we need to be able to distinguish two functions that have the same graph, as a set of ordered pairs, but different codomains. Two functions can have different codomains even if their graphs, as sets of ordered pairs, are the same set (Notice that if two functions have the same graph then they must have the same domain.) If this is not clear, see the example "Three closely-related functions, no two of which are equal," which comes after the definitions listed below.

We write $f : A \rightarrow B$ to state that $f$ is a function from set $A$ to set $B.$
We often refer to the ordered triple as "f" without explicitly mentioning the other two members of the ordered triple. That is, we refer to the function as its set of ordered pairs $f$, but it is very important to remember that the actual definition includes the domain and codomain, too.
It is important to note that the graph of $f$ is the set of ordered pairs which we often represent by plotting points, but that plot is only a representation of the graph (in the same way that "five" and "cinco" are verbal representations of a number but are not the number itself.)
We write $f(a)=b$ instead of $(a,\, b) \in f$. The value $b = f(a)$ is called the image of $a$ assigned by $f,$ and $a$ is called the pre-image of $b.$

The range of $f$ is the set $\{ f(a) : a \in A \}$, that is, the set of all images (output values) assigned by $f.$ The range is the set of $b \in B$ such that there is at least one ordered pair $( a, \, b) \in f.$
Two functions are equal if they have the same graph, the same domain, and the same codomain. That is, the functions $(f,\, A,\, B)$ and $(g,\, S,\, T)$ are equal if they are identical as ordered triples: $f = g$ and $A = S$ and $B = T.$ We can also simply say that "$f$ and $g$ are the same function."
Notice that the graph $f$ in the formal definition replaces the rule used in the informal definition. Given the graph, which is the set of ordered pairs, we can state a rule as "given an input $a \in A,$ the output is the one $b \in B$ such that $(a,\, b) \in f.$" This is exactly how you would use a table of values to represent a function: Find the row with the input value then choose the value in the output column

The graph (i.e., the set of ordered pairs), the domain, and the codomain determine the function, NOT the formula, words, table, plot, or code used to describe a rule for the function.

The graph of a function determines how to assign each input to its output. For example, the functions $f: \mathbb{R} \rightarrow \mathbb{R}$ and $g: \mathbb{R} \rightarrow \mathbb{R}$ defined by the formulae $f(x) = |x|$ and $g(x) = \sqrt(x^{2})$ are equal, and in fact are one and the same function, because $f = g$ as sets, so the "two" functions have the same graph, the same domain, and the same codomain. The "two" functions are just two ways of describing the same ordered triple.

Example 2 - Three closely-related functions, no two of which are equal.

Consider functions $f$, $g$, and $h$ defined as follows:

$f : \mathbb{R} \rightarrow \mathbb{R}$ is defined by $f = \{ (x,\,x^{2}) \mid x \in \mathbb{R} \}.$

$g : \mathbb{R} \rightarrow \mathbb R_{\ge 0}$ is defined by $g = \{ (x,\,x^{2}) \mid x \in \mathbb{R} \}.$

$h : \mathbb{N} \rightarrow \mathbb{N}$ is defined by $h = \{ (x,\,x^{2}) \mid x \in \mathbb{N} \}.$

No two of these functions are equal even though they can all be described by the rule "the output is the square of the input" and have identical formulas: $f(x) = x^{2},$ $g(x) = x^{2},$ and $h(x) = x^{2}.$ Each of the functions is defined for a different domain and/or codomain than the other two. In particular, $f$ and $g$ are not equal because they have different codomains, even though the two functions have the same graph and the same domain.

8.3. Properties of Functions

In this subsection you will learn about several properties of functions.

8.3.1. Injective Functions

A function $f : A \rightarrow B$ is injective if distinct elements of the domain $A$ are mapped to distinct elements of the range. That is, for all $a_1$ and $a_2$ in $A,$ if $a_1 \neq a_2$ then $f(a_1) \neq f(a_2).$ Using the contrapositive, this can be stated as: For all $a_1$ and $a_2$ in $A,$ if $f(a_1) = f(a_2)$ then $a_1 = a_2.$
Note: Injective functions are also called one to one functions. The Remix avoids this term because it is easy to confuse "one to one function" with "one-to-one correspondence."

Example 3 - Injective Functions

Consider the functions
$f : \mathbb{Z} \rightarrow \mathbb{Q}$ defined by $f(n) = 2^{n}$
$g : \mathbb{Z} \rightarrow \mathbb{Z}$ defined by $g(n) = n^{2},$ and
$h : \mathbb{Z} \rightarrow \mathbb{Z}$ defined by $h(n) = n + 2,$ and
$k : \mathbb{Z} \rightarrow \mathbb{Z}$ defined by $k(n) = \frac{1}{4}((-1)^n (2n+1) - 1).$

$f$ is injective because different input values must be mapped to different output values. Notice that $f(a) = f(c)$ means that $2^{a} = 2^{c}$ from which $2^{a} / 2^{c} = 1 = 2^{a-c}$ must be True, so $a-c = 0$ must be True, that is, $a = c.$

$g$ is not injective because the input values $2$ and $-2$ are mapped to the same output value: $(2)^2 = 4$ and $(-2)^2 = 4.$

$h$ is injective because $h(a) = h(c)$ means that $a+2 = c+2,$ which means that $a=c.$

$k$ is not injective because the input values $-1$ and $0$ are mapped to the same output value, 0.

8.3.2. Surjective Functions

A function $f$ from the set $A$ to the set $B$ is surjective if the image set of $A$ is the entire set $B$. This means than for each element $b$ in the codomain $B,$ there is some element $a \in A$ with $f(a)=b$.
Note: Surjective functions are also called onto functions.

Example 4 - Surjective Functions

$f$ is not surjective since it is not possible for $2^{n}$ to have a value that is less than or equal to 0.

$g$ is not surjective because is not possible for $n^{2}$ to have a value that is less than 0.

$h$ is surjective because every $b$ in the codomain $\mathbb{Z}$ is an output for some input: Notice that $h(b-2) = (b-2+2) = 2.$

$k$ is surjective because every $b$ in the codomain $\mathbb{Z}$ is an output for some nonnegative input - for inputs $n \geq 0,$ the outputs $k(n)$ are shown in the lower row of the image.

Notice that whether a function is surjective depends on what the function’s codomain. This is, again, why the formal definition of function is needed.

8.3.3. Bijective Functions

A function $f$ is bijective if it is both injective and surjective.

Example 5 - Verifying a function is bijective

Verify that the function $f\left(x\right)=3x+5$, from $f:R\rightarrow R$, is bijective.

Solution

For injectivity, suppose $f\left(m\right)=f(n)$. We want to show $m=n$.

$f\left(m\right)=f(n)$

$3m+5=3n+5$

Subtracting 5 from both sides gives $3m=3n$, and then multiplying both sides by $\frac{1}{3}$ gives $m=n$.

To show that $f\left(x\right)$ is surjective we need to show that any $c\in R$ can be reached by $f\left(x\right)$. Specifically, to show that $f\left(x\right)$ is surjective, we need to show that for any $c\in R$, there is a corresponding $x$ for which $f\left(x\right)=c$. To show this consider $f\left(x\right)=3x+5$. Equate to $c$ and solve for $x$.

$f\left(x\right)=3x+5=c$

Well, $3x+5=c$ gives $3x=c-5$ or $ x=\frac{c-5}{3}$. So, for any $c$, there is an $x$, namely $x=\frac{c-5}{3}$, for which $f\left(x\right)=c$.

8.4. Inverse Functions

Informally, a function $f$ is invertible if each $b$ in the codomain $B$ is assigned to exactly one input $a$ in the domain $A.$

Formally, a function $f : A \rightarrow B$ is invertible if the ordered triple $(\{(b, a) \, | \, (a, b) \in f \},\, B,\, A)$ is a function.

The set $\{(b, a) \, | \, (a, b) \in f \}$ is usually denoted by $f^{-1}$ even in cases when $f$ is not invertible.

For example if $(a,b)$, corresponds to $f(a)=b$ , then $ f^{-1}: B \rightarrow A$, corresponds to $ f^{-1}(b)=a$.

The following theorem shows that invertibility of a function is equivalent to bijectivity, or a function being both injective and surjective.

Theorem on Invertibility

A function $f: A \rightarrow B$ is invertible if and only if $f$ is bijective.

Being able to solve an equation, amounts to being able to invert a function. Notationally, solving $f(x) =b$ means solving for $x$.

Using inverses $f(x) =b$ is solved $x=f^{-1}\left(b\right)$.

Consider, for example, $f\left(x\right)=x^3$ we know

\$ f^{\left(-1\right)}\left(x\right)=root(3)(x)\$

Solving $f\left(x\right)=2$ means solving $x^3=2$. To solve $f\left(x\right)=2$, we use $x=f^{-1}\left(8\right)$, which in this case means,

\$ x=f^{-1}\left(8\right)=root(3)(8) = 2\$

An easy check $ f\left(2\right)=2^3=8$ and

\$ f^{-1}\left(8\right)=root(3)(8) = 2\$

Functions can, in many cases, be visualized graphically. For example when mapping from the real line $\mathbb{R}$ to the real line such maps are viewed on a Cartesian plane.

In Appendix: Library of Functions, several functions and their plots are shown to illustrate the important concepts of functions, including domain, codomain, range, and invertibility.

8.5. The Algebra of Functions

If two functions $f\left(x\right)$ and $g\left(x\right)$ have the same domain $A$ and same codomain $\mathbb{R},$ then you can combine these functions using the operations addition, subtraction, multiplication, and division.

The Algebra of Functions

$\left(f+g\right)\left(x\right)=f\left(x\right)+g\left(x\right)$
$\left(f-g\right)\left(x\right)=f\left(x\right)-g\left(x\right)$
$\left(f\cdot\ g\right)\left(x\right)=f\left(x\right)\cdot\ g\left(x\right)$
$\left(\frac{f}{g}\right)\left(x\right)=\frac{f\left(x\right)}{g\left(x\right)},\ \ g\left(x\right)\neq0$

Example 6

Consider $f\left(x\right)=x^2+1$ and $g\left(x\right)=\sqrt x$ defined on $f,\ g: \mathbb{R}_{\geq0} \rightarrow \mathbb{R}$. Find the rules for the functions $\left(f+g\right)$, $\left(f-g\right)$, $\left(f\cdot\ g\right)$, and $\left(\frac{f}{g}\right)?$

Solution

The common domain is $\mathbb{R}_{\geq0}$, since the square root is real valued only for $\ x\ \geq0$.

$\left(f+g\right)\left(x\right)=f\left(x\right)+g\left(x\right)=x^2+1+\sqrt x$ , for $ x ≥ 0$

$\left(f-g\right)\left(x\right)=f\left(x\right)-g\left(x\right)=x^2+1- \sqrt x$ , for $ x ≥ 0$

$\left(f\cdot\ g\right)\left(x\right)=f\left(x\right)\cdot\ g\left(x\right)=\left(x^2+1\right)\cdot\ \sqrt x$, for $ x ≥ 0$

$\left(\frac{f}{g}\right)\left(x\right)=\frac{f\left(x\right)}{g\left(x\right)}=\frac{x^2+1\cdot\ }{\ \sqrt x}$, for $ x > 0$.

Notice that the domain of $\frac{f}{g}$ is $x>0$, because $g\left(0\right)=\sqrt0=0$, and division by $0$ is not defined.

8.6. Composition of Functions

Suppose $g:A\rightarrow B$ and $f:B\rightarrow C$, then the functions $ f$ and $g$, can be composed to obtain a function $h:A\rightarrow C$, denoted as follows,

$h\left(x\right)=\left(f\circ g\right)\left(x\right)=f\left(g\left(x\right)\right)$ provided $x\ \in\ A$ and $g\left(x\right)\in B$.

Example 7

Consider $f\left(x\right)=\frac{1}{x}$ and $g\left(x\right)=2x-3$, defined on $f,g:R\rightarrow R$. Notice that $g\left(x\right)$ is defined for all real $x$ and $f\left(x\right)$ is defined for all real $x\ \neq0$. Form the compositions, $h\left(x\right)=\left(f \circ g\right)\left(x\right)$, and $k\left(x\right)=\left(g \circ f\right)\left(x\right)$. Also determine their respective domains.

Solution

$h\left(x\right)=\left(f \circ g\right)\left(x\right)=f\left(g\left(x\right)\right)=f\left(2x-3\right)=\frac{1}{2x-3}$. Here $x$ needs to be in the domain of $g\left(x\right)$, or all real $x$, and $g\left(x\right)$ needs to be in the domain of $f\left(x\right)$. In particular $g\left(x\right)\neq 0$, or $2x-3\ \neq 0$, or $x\ \neq\frac{3}{2}$.

By contrast, $k\left(x\right)=\left(g\circ f\right)\left(x\right)=g\left(f\left(x\right)\right)=g\left(\frac{1}{x}\right)=2\left(\frac{1}{x}\right)-3=\frac{2}{x}-3$. Here $x$ needs to be in the domain of $f\left(x\right)$, or $x\ \neq 0$, and $f\left(x\right)$ needs to be in the domain of $g\left(x\right)$, or $f\left(x\right)$ can be any real number.

Example 8 - composing inverse functions

Consider $f\left(x\right)=x^3+1$ and \$g(x) =root(3)(x-1)\$ defined on on $f,g:R\rightarrow R$. Show that $\left(g \circ f\right)\left(1\right)=1, \left(g \circ f\right)\left(2\right)=2, \left(g\circ f\right)\left(3\right)=3$, and $\left(g\circ f\right)\left(x\right)=x$

Solution

$f\left(1\right)=1^3+1=2$

$f\left(2\right)=2^3+1=9$

$f\left(3\right)=3^3+1=28$

$f\left(x\right)=x^3+1$

Therefore,

$ \left(g\circ f\right)\left(1\right)=g\left(f\left(1\right)\right)=g\left(2\right)=$ \$ root(3)(2-1)= root(3)(1)=1\$

$\left(g\circ f\right)\left(2\right)=g\left(f\left(2\right)\right)=g\left(9\right)=$ \$ root(3)(9-1)= root(3)(8)=2\$

$\left(g\circ f\right)\left(3\right)=g\left(f\left(3\right)\right)=g\left(28\right)=$ \$ root(3)(28-1)= root(3)(27)=3\$

$\left(g\circ f\right)\left(x\right)=g\left(f\left(x\right)\right)=g\left(x^3+1\ \right)=$\$ root(3)(x^3 +1 -1)= root(3)(x^3 )=x\$

Notice, in the last example, that $g\left(x\right)$ undoes $f\left(x\right)$, in the following sense:

$f:1\rightarrow 2$ and $g:2\rightarrow 1$, or the ordered pair $\left(1,2\right)$ in $f$, corresponds to $\left(2,1\right)$ for $g$.

$f:2\rightarrow 9$ and $g:9\rightarrow 2$, or the ordered pair $\left(2,9\right)$, in $f$, corresponds to $\left(9,2\right)$ for $g$.

$f:3\rightarrow 28$ and $g:28\rightarrow 3$, or the ordered pair $\left(3,28\right)$, in $f$, corresponds to $\left(28,3\right)$ for $g$.

$f:x\rightarrow x^3+1$ and $g:x^3+1\rightarrow x$, or the ordered pair $\left(x,x^3+1\right)$, in $f$, corresponds to $\left(x^3+1,x\right)$ for $g$.

The function \$ g(x))= root(3)(x-1) \$ is said to be the inverse of the function $f\left(x\right)=x^3+1$. We have shown explicitly that $\left(g\circ f\right)\left(x\right)=x$.

8.6.1. Inverse Functions and Composition

Notice that if you happen to have two functions $f : A \rightarrow B$ and $g : B \rightarrow A$ such that $(g \circ f)(a) = g(f(a)) = a$ for every $a \in A$ and $(f \circ g)(b) = f(g(b)) = b$ for every $b \in B,$ then $f$ and $g$ are inverse functions.

Example 9 - finding an inverse

Find the inverse $g\left(x\right)$ of the bijective function $f\left(x\right)=3x+5$ for $f,\ g:R\rightarrow R$ . Verify the inverse and show $\left(f \circ g\right)\left(x\right)=x=\left(g \circ f\right)\left(x\right)$.

Show specifically that $f\left(2\right)=11$, and $g\left(11\right)=2$.

Solution

If $f:x\rightarrow y$ corresponds to $(x,y)$, then the inverse $g:y\rightarrow x$ corresponds to $(y,x)$. This means that the inverse of the relation $y=f\left(x\right)=3x+5$, is the relation $x=f\left(y\right)=3y+5$.

Solving for $y$ in $x=f\left(y\right)$, gives $f^{-1}(x)=y$. Solving for $y$ in $x=f\left(y\right)=3y+5$, gives $x-5=3y$ or $\frac{x-5}{3}=y=\ f^{-1}(x)=g(x)$.

We now verify that $\left(f\circ g\right)\left(x\right)=x=\left(g \circ f\right)\left(x\right)$.

$\left(f\circ g\right)\left(x\right)=f\left(\frac{x-5}{3}\right)=\ 3\left(\frac{x-5}{3}\right)+5=\left(x-5\right)+5=x$,

and $\left(g \circ f\right)\left(x\right)=g\left(3x+5\right)=\ \frac{(3x+5)-5}{3}=\frac{3x+5-5}{3}=\frac{3x}{3}=x$.

Finally $f\left(x\right)=3x+5$, and $f\left(2\right)=3\left(2\right)+5=6+5=11$, or $f:2\rightarrow 11$

and $g\left(x\right)=\frac{x-5}{3}$ and , $g\left(11\right)=\frac{11-5}{3}=\frac{6}{3}=2$ or $g:11\rightarrow 2$.

8.7. Exercises

Remixer’s Note: This section is taken from the original “Discrete Math” book with only minor changes.

What can be said about the relation $f:A\rightarrow B$, if
1. $\exists z\in B\forall x\in A,f\left(x\right)\neq z$
2. $\exists x,y \in A, \exists z\in B,\left(x\neq y\right)\bigwedge\left(f\left(x\right)=f\left(y\right)=z\right)$
3. $\forall x,y\in A, \left(f\left(x\right)=f\left(y\right)\right)\ \rightarrow\left(x=y\right)$
4. $\forall x,y\in A,\left(x\neq y\right)\rightarrow\left(f\left(x\right)\neq f\left(y\right)\right)$
5. $\forall z\in B, \exists x,f\left(x\right)=z$
6. $\exists x,y\in A,\left(f\left(x\right)=f\left(y\right)\right)\bigwedge\left(x\ \neq\ y\right)$
Explain why exponential function $f(x)=2^x$ is not surjective from $f: \mathbb{R} \rightarrow \mathbb{R}$, but is in fact a bijection from $f: \mathbb{R} \rightarrow \mathbb{R}^+$.
Use properties of logarithms to show that $f\left(x\right)=2^x$ and $g\left(x\right)=\log_2{x}$, where $f, g: \mathbb{R} \rightarrow \mathbb{R}$, are inverses by verifying that $f\left(g\left(x\right)\right)=g\left(f\left(x\right)\right)=x$.
Use properties of logarithms to show that $f\left(x\right)=10^x$ and $g\left(x\right)=\log{x}$, where $f, g: \mathbb{R} \rightarrow \mathbb{R}$, are inverses by verifying that $f\left(g\left(x\right)\right)=g\left(f\left(x\right)\right)=x$.
Show that the function $f\left(x\right)=5x-3$, from $f: \mathbb{R} \rightarrow \mathbb{R}$, is bijective and find its inverse.
Show that the function $f\left(x\right)=2x^3-1$, from $f: \mathbb{R} \rightarrow \mathbb{R}$ is bijective and find its inverse.
Consider the function $f(x) = \left \lceil x \right \rceil$ where $f:\mathbb{R}\rightarrow\mathbb{Z}$.
1. Is the function a surjection? Explain.
2. Is the function an injection? Explain
3. Is the function a bijection? Explain
4. Is the inverse mapping a function? Why or why not?
5. Evaluate
  1. $f\left(-2.1\right)$
  2. $f\left(-1.9\right)$
  3. $f\left(1.5\right)$
  4. $f\left(1.9\right)$
  5. $f\left(2\right)$
  6. $f\left(2.3\right) $
6. Suppose $g\left(x\right)=2x$, with $f\left(x\right)=\left\lceil x\right\rceil$. Evaluate the following:
  1. $f\left(g\left(2.3\right)\right)$
  2. $g\left(f\left(2.3\right)\right)$
Explain why ceiling function $ \left \lceil x \right \rceil$ is not surjective from $f: \mathbb{R} \rightarrow \mathbb{R}.$
Consider the function $f(x) = \left \lfloor x \right \rfloor$ where $f:\mathbb{R}\rightarrow\mathbb{Z}$.
1. Is the function a surjection? Explain.
2. Is the function an injection? Explain
3. Is the function a bijection? Explain
4. Is the inverse mapping a function? Why or why not?
5. Evaluate
  1. $f\left(-5.1\right) $
  2. $f\left(-3.9\right)$
  3. $f\left(-3.2\right)$
  4. $f\left(5\right) $_
  5. $f\left(5.3\right)$
6. Suppose $g\left(x\right)=3x$, with $f\left(x\right)=\left\lfloor x\right\rfloor$. Evaluate the following:
  1. $f\left(g\left(5.3\right)\right)$
  2. $g\left(f\left(5.3\right)\right)$
The absolute value function, denoted $f(x)=|x|$, where $f\left(x\right):\mathbb{R} \rightarrow \mathbb{R}$, gives the distance from $x$ to $0$. For example, $f\left(2.5\right)=\left|2.5\right|=2.5$. And $f\left(-4.5\right)=\left|-4.5\right|=4.5$. Notice that if $x \geq 0$, then $\left|x\right|=x$. However if $x<0$, then $\left|x\right|=\ -x$. We can state this using the notation for piecewise functions:

\$f(x) = |x|={( x, if x ≥ 0),(-x,if x < 0):}\$
1. Graph $f\left(x\right)=|x|$, for -$10\ \le x\ \le10$
2. Evaluate
  1. $f(-5)=|-5|$,
  2. $f(-2.5)=|-2.5|$,
  3. $f(3.5)=|3.5|$.
3. Show that $f\left(x\right)=\left|x\right|$, with $f:\mathbb{R}\rightarrow \mathbb{R}$, is not injective.
4. Show that $f\left(x\right)=\left|x\right|$, with $f:\mathbb{R}\rightarrow \mathbb{R}$, is not surjective.
5. Consider $g\left(x\right)=3x+2$, with $g:\mathbb{R}\rightarrow \mathbb{R}$, and $f\left(x\right)=|x|$. Find and simplify the following:
  1. $\left(g\circ f\right)\left(x\right)$
  2. $\left(f\circ g\right)\left(x\right)$
A real-valued function, $f: \mathbb{R} \rightarrow \mathbb{R}$, is said to be strictly increasing if whenever \$x<y\$, then \$f(x)<f(y)\$.
1. State this using logical quantifiers.
2. State a similar definition for a strictly decreasing function, and then translate using logical quantifiers.

9. Relations

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on October 20, 2025.
Revised the sections/subsections on partial orders, well-ordering, and modular arithmetic.
Made other minor revisions and fixed typos.

Relations are used to describe an association of data.

For example, imagine a university database of students. The database needs to have a record for each student, and the student’s record needs to include fields for the student’s name(s), the unique student ID number for the student, the student’s current status, a list of courses that the student has enrolled in or has completed (along with the grade earned in each completed course,) and possibly other data associated with the student. One way to visualize the database is as a two-dimensional table, similar to a spreadsheet worksheet, where each row corresponds to a record and each column corresponds to a field; each row can be treated as an ordered $n$-tuple, where $n$ is the number of fields.

In this chapter, you will learn about the formal definition of relation, operations and properties of relations. You will also learn about some special types of relations, namely partial orderings and equivalence relations. As a special case of equivalence relations, you will learn about congruence relations of integers as well as modular arithmetic.

Key terms and concepts covered in this chapter:

Relations
Properties of relations (reflexivity, symmetry, transitivity, and other properties)
Equivalence relations
- Equivalence classes
Modular arithmetic
- Congruences
Partial orders
Well orderings

9.1. Definition of Relation

Informally, a relation on two or more sets is an association between elements of those sets. Formally, a relation is a subset of a Cartesian product of two or more sets.
For simplicity, it is assumed in this textbook that the number of sets used to form the Cartesian product is finite. It is possible to define relations as subsets of a Cartesian product of infinitely many sets, in which case the elements of the relation are infinite sequences.

Definition

An n-ary relation on the sets $A_{1}, \, A_{2}, \, \ldots \, A_{n},$ where $n \geq 2$ is a natural number, is a subset $R$ of the Cartesian product $A_{1} \times A_{2} \times \cdots \times A_{n}.$ That is, any subset $R \subseteq A_{1} \times A_{2} \times \cdots \times A_{n}$ is a n-ary relation on the sets $A_{1}, \, A_{2}, \, \ldots \, A_{n}.$

The sets $A_{1}, \, A_{2}, \, \ldots \, A_{n}$ are called the domains of the relation $R.$

The number of sets $n$ is called the degree of the relation $R.$

Here are several examples of relations.

Example 1 - Relations

Let $A$ be the set of names of students currently enrolled at a college, $B$ be the set of all possible student ID numbers, and $C$ be the set of all classes currently offered at the college. The set \[ R = \{ (x, y, z) \mid \text{Student } x \text{ has ID number } y \text{ and is enrolled in class } z \} \] is a 3-ary relation, also called a ternary relation, on $A,$ $B,$ and $C.$
Let $S$ be the set of names of students currently enrolled at a college. The set \[ R = \{ (x, y) \mid \text{Student } x \text{ is enrolled in a class section in which student } y \text{ also is enrolled} \} \] is a 2-ary relation, usually called a binary relation on $A$ (since the two sets in $A \times A$ are the same set.) Most of the focus of this chapter will be on binary relations.

Let $f : A \rightarrow B$ be any function, as defined formally in the Functions chapter. Recall that part of the formal definition states that $f$ is a subset of $A \times B,$ so the set $f$ is a binary relation on the domains $A$ and $B.$

Notice that for any sets $A_{1}, \, A_{2}, \, \ldots \, A_{n},$ where $n \geq 2$ is a natural number, we always have the following two relations:

$\emptyset,$ called the empty relation. This relation is also called the void relation and trivial relation in other sources.
$A_{1} \times A_{2} \times \cdots \times A_{n},$ called the universal relation.

9.2. Binary Relations on a Single Set

In many cases, the two domains of a binary relation are the same set; for example, the relation may involve comparing two elements of a set S in some way. In this case, the domain $S$ is mentioned only once: A binary relation on set S is any subset $R$ of the cartesian product $S \times S$, that is, $R \subseteq S \times S.$

In the case of a binary relation on $S,$ we often write $aRb$ to mean the same thing as $(a,b) \in R.$ This notation may make more sense after reading the following example.

Example 2 - Binary Relations on a Set

Here are some examples of binary relations on one set.

$R = \{ (x, y) \in \mathbb{R} \times \mathbb{R} \, | \, x \leq y \}$. $R$ is a binary relation on $\mathbb{R}.$ We write $x \leq y$ instead of $(x,y) \in R$ (Notice that in this case, we are writing $xRy$ but replacing the $R$ by the symbol $\leq.$)
$D = \{ (a, b) \in \mathbb{Z} \times \mathbb{Z} \, | \, a \text{ is a divisor of } b \}$. $D$ is a binary relation on $\mathbb{Z}.$ In this case, we often write $a | b$ instead of $(a,b) \in D.$ You may not be surprised to learn that this relation is called the divisibility relation on the integers.
$M = \{ (a, b) \in \mathbb{Z} \times \mathbb{Z} \, | \, 2 \text{ is a divisor of } (a-b) \}$. $M$ is a binary relation on $\mathbb{Z},$ called congruence modulo 2. This textbook uses the common but nonstandard notation $a \equiv_{2} b$ instead of $(a,b) \in M.$
Note that the ISO standard notation for this relation is $a \equiv b \ \text{mod } 2$ and that other sources use $a \equiv b \ (\text{mod } 2).$

For a set $S,$ recall that $\mathcal{P}(S)$ is the power set of $S,$ that is, the set whose elements are all possible subsets of $S.$ Let $R = \{ (A, B) \in \mathcal{P}(S) \times \mathcal{P}(S) \, | \, A \subseteq B \}.$ $R$ is a binary relation on $\mathcal{P}(S),$ and we write $A \subseteq B$ instead of $(A,B) \in R.$
For a set $S,$ we can also define a different relation $R = \{ (A, B) \in \mathcal{P}(S) \times \mathcal{P}(S) \, | \, A \subset B \},$ that is, $A$ is a proper subset of $B.$ This relation is also a binary relation on $\mathcal{P}(S),$ and we write $A \subset B$ instead of $(A,B) \in R.$
Let $S$ be any nonempty set. The set $\mathbf{id}_S = \{ (x, x) \in \mathbb{R} \times \mathbb{R} \, | \, x \in S \}$ is a binary relation on $S$ which we refer to as the identity relation on $S$ (Other sources call this the diagonal of $S,$ or simply the equality relation on $S.$) We write $a=b$ instead of $(a,b) \in \mathbf{id}_S.$

$L = \{ \text{("rock", "paper"), ("paper", "scissors"), ("scissors", "rock")} \}$ is a binary relation on the set $\{ \text{"rock", "paper", "scissors"} \}.$ We write $xLy$ (that is, "$x$ loses to $y$") instead of $(x,y) \in L.$

9.2.1. Operations on Binary Relations on a Set

Given binary relations $Q$ and $R$ on a set $S,$ we can define several other relations in terms of $Q$ and $R.$ These operations are likely familiar to you as operations on functions but they also work for binary relations on a single set.

The inverse of $R$ is the relation $R^{-1} = \{ (b,a) \, | \, (a,b) \in R \}.$
The composition of Q and R is $R \circ Q = \{ (a,c) \, | \, (a,b) \in Q \land (b,c) \in R \}.$
The n th power of $R$ is defined recursively for all $n \in \mathbb{N}$ as follows.
- $R^{0} = \mathbf{id}_S$
- $R^{k+1} = R \circ R^{k}$ for natural numbers $k > 0.$
  The recursion step uses $k$ instead of $n$ in preparation for the type of arguments used in the chapter on proof by mathematical induction.

Building on the $n$th powers of $R,$ we can define two relations.

$R^{+}$ is the relation $\{ (a,b) \in S \times S \, | \, (a,b) \in R^{k} \text{ for some positive integer } k \}.$ That is, $R^{+}$ is the union of all the positive $n$th powers of $R.$
$R^{*}$ is the relation $\{ (a,b) \in S \times S \, | \, (a,b) \in R^{k} \text{ for some natural number } k \}.$ That is, $R^{*}$ is the union of all the natural number $n$th powers of $R.$

Notice that $R^{*} = \mathbf{id}_S \cup R^{+}.$

9.2.2. Properties of Binary Relations on a Set

In this subsection we define five properties that a relation may satisfy.

Definitions

Let $R$ be a binary relation on the set $S.$

$R$ is reflexive if and only if for all $a \in S,$ $(a, a) \in R.$
$R$ is irreflexive if and only if for all $a \in S,$ $(a, a) \not\in R.$
$R$ is symmetric if and only if for all $a \in S$ and $b \in S,$ $(a, b) \in R \rightarrow (b,a) \in R.$

$R$ is antisymmetric if and only if for all $a \in S$ and $b \in S,$ $(a, b) \in R \land (b, a) \in R \rightarrow a = b.$
Equivalently, $R$ is antisymmetric if and only if for all $a \in S$ and $b \in S,$ $(a, b) \in R \land a \neq b \rightarrow (b,a) \not\in R.$
$R$ is transitive if and only if for all $a \in S,$ $b \in S,$ and $c \in S,$ $(a, b) \in R \land (b, c) \in R \rightarrow (a,c) \in R.$

The following theorem can make it easier to determine when a relationship has each of the five properties. The proof of the theorem is an exercise.

Theorem

Let $R$ be a binary relation on the set $S.$

$R$ is reflexive if and only if $\mathbf{id}_S \subseteq R.$
$R$ is irreflexive if and only if $\mathbf{id}_S \cap R = \emptyset.$
$R$ is symmetric if and only if $R^{-1} = R.$
$R$ is antisymmemtric if and only if $R^{-1} \cap R \subseteq \mathbf{id}_S.$
$R$ is transitive if and only if $R^{2} \subseteq R.$
Recall that $R^{2}$ is defined to be the composition $R \circ R.$

9.2.3. Closures of Binary Relations with Respect to a Property

For each of the properties reflexivity, symmetry, and transitivity, we define the closure with respect to the property of a relation $R$ as follows: The closure is the smallest relation that has the property and includes all the elements of $R.$ That is, you start with $R$ and try to insert in just enough ordered pairs, if any are needed, to make sure that the new relation has the desired property.

The following theorem justifies that the reflexive closure, symmetric closure, and transitive closure exist for any relation $R.$ The proof of the theorem is an exercise.

Theorem

Let $R$ be a binary relation on the set $S.$

The reflexive closure of $R$ is the relation $R \cup \mathbf{id}_S.$
The symmetric closure of $R$ is the relation $R \cup R^{-1}.$
The transitive closure of $R$ is the relation $R^{+}.$

Notice that we can also define the reflexive and transitive closure of a relation $R$ as the relation $R^{*},$ which is the reflexive closure of the transitive closure of $R.$

However, for some properties, the closure of a relation $R$ with respect to the property may not exist!

Informal Exercise

The irreflexive closure and antisymmetric closure only exist if $R$ satisfies certain conditions.

Find a description of the relations $R$ that do have an irreflexive closure.
Find a description of the relations $R$ that do have an antisymmetric closure.

Hint

Use the theorem from the previous subsection that describes irreflexive relations and antisymmetric relations in terms of intersections of sets.

9.3. Equivalence Relations

A binary relation $R$ on a set $S$ is called an equivalence relation on $S$ if $R$ is reflexive, symmetric, and transitive.

A first example of an equivalence relation is the diagonal, that is, the equality relation. Another example is given below.

Example 3 - The Parity Relation on the Integers

Consider the set $R = \{ (a,b) \in \mathbb{Z} \times \mathbb{Z} \, | \, \text{Both } a \text{ and } b \text{ are odd, or both } a \text{ and } b \text{ are even.} \}.$

Let’s show that $R$ is an equivalence relation.

$R$ is reflexive, since $aRa$ for every $a \in \mathbb{Z}.$ That is, both $a$ is odd and $a$ is odd, or both $a$ is even and $a$ is even (since $p \land p \leftrightarrow p$ is a tautology for any propositional variable $p.$)
$R$ is symmetric, since $aRb$ implies $bRa$ for every pair $a, b \in \mathbb{Z}.$ That is, both $a$ and $b$ are odd whenever both $b$ and $a$ are odd, and both $a$ and $b$ are even whenever both $b$ and $a$ are even (since $p \land q \leftrightarrow q \land p$ is a tautology for any propositional variables $p$ and $q.$)
$R$ is transitive, since $aRb$ and $bRc$ implies $aRc$ for every triple $a, b, c \in \mathbb{Z}.$ That is, if both $a$ and $b$ are odd and both $b$ and $c$ are odd, then both $a$ and $c$ are odd, and if both $a$ and $b$ are even and both $b$ and $c$ are even, then both $a$ and $c$ are even (since $(p \land q) \land (q \land r) \rightarrow (p \land r)$ is a tautology for any propositional variables $p,$ $q,$ and $r.$)

It is not difficult to see that this relation can also be defined as $R = \{ (a,b) \in \mathbb{Z} \times \mathbb{Z} \, | \, 2 \text{ is a divisor of } (a-b) \}.$ So this relation is the same as the $\equiv_{2}$ relation discussed in an earlier example.

Given an equivalence relation $R$ on the set $S$ and an element $x \in S,$ we define the equivalence class of $x$ to be $[ x ]_{R} = \{ y \in S \, | \, (x,y) \in R \}.$

Theorem

Let $R$ be a binary relation on the set $S.$

If $R$ is an equivalence relation then the set of all equivalence classes $\{ [ x ]_{R} \, | \, x \in S \}$ is a partition of S.

Conversely, if $\Pi$ is a partition of $S$, then the relation defined by $R = \{ (x, y) \, | \, x \text{ and } y \text{ are elements of the same subset in } \Pi \}$ is an equivalence relation.

9.4. Order Relations on a Set

It is often useful to be able to compare elements of a set, based on some key property. In this subsection, several examples of such order relations will be discussed.

9.4.1. Partial Orderings

A binary relation $R$ on a set $S$ is called a partial order on $S$ if $R$ is reflexive, antisymmetric, and transitive.

Example 4 - Partial Orders

For any set $S,$ each of the relations $\subseteq$ and $\subset$ is a partial order on $\mathcal{P}(S),$ the power set of $S.$

subset_lattice_for_set_with_3_members_v2

As an example, for the set $S = \{ r, g, b \},$ the image shows a Hasse diagram that represents the $\subset$ relation on the power set $\mathcal{P}(S).$ Notice that each line segment in the Hasse diagram connects a first subset of $S$ to a second subset of $S$ that is “immediately above” the first subset in the following sense: Each member of the first subset is also a member of the second subset, but the second subset has one additional member. In the Hasse diagram, two different subsets $A$ and $B$ satisfy $A \subset B$ if and only if there is a “path up” from $A$ to $B$ that uses one or more of the line segments.

Total Orderings

A total ordering of a set $S$ is a partial order $R$ on $S$ that has the additional property $(\forall x \in S)(\forall y \in S)(xRy \lor yRx).$

Example 5 - Total Orderings

For the set of real numbers $\mathbb{R}$ the usual order relations $\leq$ and $\geq$ are total orders.

Well-Orderings

A well-ordering of a set $S$ is a total ordering $R$ on $S$ that has the additional property that every nonempty subset of $S$ contains a least element with respect to the order relation.

One of the most important examples of a well-ordering in mathematics is the relation $\leq$ on the set of natural numbers $\mathbb{N}.$

The Well-Ordering Principle for $\mathbb{N}$

If $S$ is a nonempty subset of $\mathbb{N}$ then there is a natural number $m \in S$ such that for every element $x \in S,$ $m \leq x.$

In the Remix, the Well-Ordering Principle is treated as an axiom, a statement that is assumed to be True about the set of natural numbers.

Note to instructors - click to expand

Many textbooks assume either the Well-Ordering Principle or the Principle of Mathematical Induction as an axiom, and then prove the other principle using that axiom (and also prove that the two principles are logically equivalent.) As the Remix is designed for students of Computer Science, both principles are simply assumed to be True without proof, that is, both principles are treated as axioms. This choice should not limit student understanding of how these principles can be applied, and also avoids some highly technical issues that arise in formal mathematical definitions of the set $\mathbb{N}$ of natural numbers which would most likely only be of concern to Mathematics majors and their instructors. If you want to do a deeper mathematical exploration of these technical issues (e.g., differing formulations of the Peano axioms, and first-order versus second-order logic) you may refer to this article in The Mathematical Intelligencer as well as the subsection “Nonstandard models” of this Wikipedia page.

The Well-Ordering Principle may seem like an “obviously True” statement. To understand why mathematicians would make it clear that they are assuming that this principle is True is to understand what “infinity” means.

First, suppose that you are told that the subset $A$ of $\mathbb{N}$ contains the element 10. You can use a brute‑force method to determine the least element of $A:$ Just ask the following sequence of 10 questions, in the order shown. \[ \text{“Is 0 in $A?,$” “Is 1 in $A?,$” $\ldots$, “Is 9 in $A?$”} \] If the answer is “No” to all 10 of these questions, then 10 itself is the least element in $A$, otherwise, the least number in $A$ is the first value of $k$ for which the answer to the question “Is $k$ in $A?$” is “Yes.”
Next, suppose that you are given a new set $S$ and told that the number $10^{10^{10^{10^{10}}}}$ is in $S.$ You again could try to use the brute‑force method of asking the sequence of $10^{10^{10^{10^{10}}}}$ questions: \[ \text{“Is 0 in $S?,$” “Is 1 in $S?,$” $\ldots$, “Is $\left( 10^{10^{10^{10^{10}}}}-1 \right)$ in $S?$”} \] but even if it took only 1 nanosecond to ask and answer each question, the integer $10^{10^{10^{10^{10}}}}$ is greater than the number of nanoseconds estimated for our universe to reach its final energy state! That is, it is possible that the answer to one of these questions is “Yes” but that you would never be able to ask that question before the universe ends! Notice that you can use formal logic to justify that either the answers to all of those questions would be “No” or that at least one of the questions would have the answer “Yes” — asking the sequence of questions would determine the least element in $S$ if you had time to ask enough of the questions, but you (and humanity itself) may not have that much time. For this reason, mathematicians assume that the least element of the set $S$ exists.
Notice that a “timeless being” could know all the answers.

9.5. Modular Arithmetic (Revision in Progress!)

Let $m$ stand for some positive integer constant that is greater than 1. The relation congruence modulo $m$ is defined to be \[\{ (a, b) \in \mathbb{Z} \times \mathbb{Z} \, : \, m \text{ divides } (a-b) \}.\] The symbol $\equiv_{m}$ is used to represent congruence modulo $m$, so \[a \equiv_{m} b \text{ if and only if } m \text{ divides } (a-b).\]

As an example, $13 \equiv_{3} 7$ because $3$ divides $13-7.$ Another way to see that $13 \equiv_{3} 7$ is to notice that both $13$ and $7$ leave a remainder of $1$ when divided by $3$ using integer long division.

Notice that the relation $\equiv_{2}$ is the same as the parity relation discussed in an example earlier in this chapter. As discussed in that example, $\equiv_{2}$ is an equivalence relation. In fact, for each positive integer constant $m$ that is greater than 1, the relation $\equiv_{m}$ is an equivalence relation.

Theorem

If $m$ is a positive integer greater than 1, then $\equiv_{m}$ is an equivalence relation on $\mathbb{Z}.$

Proof

One way to prove this would be to use steps similar to the ones in the parity relation example to show that $\equiv_{m}$ is reflexive, symmetric, and transitive, but we’ll use a different way to prove this theorem: We will show that the relation $\equiv_{m}$ describes a way to partition $\mathbb{Z}$ into $m$ disjoint subsets, then apply the theorem that states that such a partition corresponds to an equivalence relation.

You’ve already seen how the partition of $\mathbb{Z}$ into the odd integers and the even integers corresponds to the equivalence relation $\equiv_{2}.$ Before giving a formal proof, let’s look at another example. If $m = 3,$ then every integer $a$ can be written in the form $a = q \cdot 3 + r$ where $q$ is an integer and $r$ is one of the integers in the set $\{ 0, 1, 2 \}.$ You can partition $\mathbb{Z}$ into the three subsets listed below: \[\{ \ldots, -3, 0, 3, 6, \ldots \},\] \[\{ \ldots, -2, 1, 4, 7, \ldots \},\] + \[\{ \ldots, -1, 2, 5, 8, \ldots \}.\]

Notice that each integer is in exactly one of those three subsets so the subsets form a partition of $\mathbb{Z}.$ It’s not difficult to see that two integers are members of the same subset of the partition if and only if $3$ divides their difference, so the subsets of the partition are the equivalence classes of the relation $\equiv_{3},$ that is, We will now present the proof for the general case. Notice that if $a$ is any integer, you can use the division algorithm to find the quotient and remainder such that $a = q \cdot m + r$, where $q$ and $r$ are integers and $0 \leq r < m.$ This shows that each integer $a$ is congruent modulo $m$ to exactly one of the integers in the set $\{ 0, 1, \ldots, m-1 \},$ which means that the $m$ sets \[\{ \ldots, -m, 0, m, \ldots \},\] \[\{ \ldots, -m+1, 1, m+1, \ldots \},\] \[\{ \ldots, -m+2, 2, m+2, \ldots \},\] \[ \vdots \] \[\{ \ldots, -2, m-2, m+(m-2), \ldots \},\] \[\{ \ldots, -1, m-1, m+(m-1), \ldots \}\] form a partition of $\mathbb{Z}.$ By the theorem proved earlier in the chapter, $\equiv_{m}$ is an equivalence relation with equivalence classes $\{ [ 0 ]_{\equiv_{m}}, \, [ 1 ]_{\equiv_{m}}, \, [ 2 ]_{\equiv_{m}}, \, \ldots \, [ m-1 ]_{\equiv_{m}} \}$ is the partition of $\mathbb{Z}$ that corresponds to the relation $\equiv_{m}.$

We can also do “arithmetic” directly with the equivalence classes of the $\equiv_{m}$ relation. The following example may make this more clear.

Example 6 - Arithmetic with Even and Odd Numbers

You likely learned, when you were quite young, that some integers are called "even" and other integers are called "odd."

Notice that every integer is either even or odd but not both, which means that the set \[\{ \text{the set of all even integers}, \, \text{the set of all odd integers} \}\] or more formally \[\{ \{ \ldots, -4, -2, 0, 2, 4, \ldots \}, \, \{ \ldots, -3, -1, 1, 3, \ldots \} \}\]forms a partition of the set $\mathbb{Z}$ of integers. This partition corresponds to the relation $\equiv_{2},$ that is, congruence modulo 2. The set of all even numbers is the equivalence class $[ 0 ]_{\equiv_{2}}$ and the set of all odd numbers is the equivalence class $[ 1 ]_{\equiv_{2}}.$

You may have learned how to do arithmetic with "Even" and "Odd," too, as shown in the tables.

When you did arithmetic with "Even" and "Odd," you were really doing arithmetic with the equivalence classes $[ 0 ]_{\equiv_{2}}$ and $[ 1 ]_{\equiv_{2}}.$ For example, it will not matter which two odd numbers you add, the result must be even because the two numbers were odd. You can just do the operations on the remainders that you get after dividing by 2, which corresponds to adding the equivalence classes: $[ 1 ]_{\equiv_{2}} + [ 1 ]_{\equiv_{2}} = [ 1+1 ]_{\equiv_{2}} = [ 2 ]_{\equiv_{2}} = [ 0 ]_{\equiv_{2}}$ where the last equality is True because $2$ and $0$ belong to the same equivalence class (that is, they are both even numbers.)

The following theorem proves that you can do addition and multiplication with the remainders (or, what amounts to the same thing, the equivalence classes) in the same way as was done with Evens and Odds in the previous example for the relation $\equiv_{m},$ where $m$ can be any integer greater than 1.

Theorem

If $m$ is an integer greater than 1, and $a,$ $b,$ $c,$ and $d$ are integers, and $a \equiv_{m} b$ and $c \equiv_{m} d,$
then $a + c \equiv_{m} b + d$ and $a \cdot c \equiv_{m} b \cdot d.$

Proof

Assume that $m,$ $a,$ $b,$ $c,$ and $d$ are integers, $m > 1,$ $a \equiv_{m} b,$ and $c \equiv_{m} d.$ This means that $m$ is a divisor of both $(a-b)$ and $(c-d),$ that is, both $(a-b)$ and $(c-d)$ are multiples of $m.$

The sum $(a-b) + (c-d)$ must also be a multiple of $m,$ and this sum can be rewritten using properties of addition as $(a+c) - (b+d).$ This shows that $m$ is a divisor of $(a+c) - (b+d,)$ which can also be stated as \[(a+c) \equiv_{m} (b+d).\]

The expressions $(a-b) \cdot c$ and $b \cdot (c-d)$ must be multiples of $m,$ and the sum of those expressions is $(a-b) \cdot c + b \cdot (c-d),$ which can be simplified using properties of multiplication and addition to $(a \cdot c) - (b \cdot d).$ This shows that $m$ is a divisor of $(a \cdot c) - (b \cdot d),$ which can also be stated as \[(a \cdot c) \equiv_{m} (b \cdot d).\]

Q.E.D.

For example, we can write $9 + 5 \equiv_{12} 2$ (This is an example of "clock arithmetic" using a 12-hour clock: 5 hours after 9 o’clock will be 2 o’clock.)

Example 7 - Solving a linear congruence

If it is 12 o’clock now, what is the least number of 7-hour intervals that must pass before the clock will read 4 o’clock?

This question is equivalent to finding the smallest natural number $n$ that solves the linear congruence $7n \equiv_{12} 4.$

One way to solve this congruence is to treat this as a "clock arithmetic" problem:
After one 7-hour interval passes, the clock will read 7 o-clock.
After two 7-hour intervals pass, the clock will read 2 o-clock, because $7+7=14$ and $14 \equiv_{12} 2.$
After three 7-hour intervals pass, the clock will read 9 o-clock, because $2+7 = 9$.
After four 7-hour intervals pass, the clock will read 4 o-clock, because $9+7 = 16$ and $16 \equiv_{12} 4.$.
So the least number of 7-hour intervals that must pass before the clock will read 4 o’clock is four.

Another way to solve this congruence is to consider the remainders of natural number-multiples of 7 after dividing by 12:
$1 \cdot 7 = 7$ and $7 \equiv_{12} 7$
$2 \cdot 7 = 14$ and $14 \equiv_{12} 2$ since $14 = 1 \cdot 12 + 2$
$3 \cdot 7 = 21$ and $21 \equiv_{12} 9$ since $21 = 1 \cdot 12 + 9$
$4 \cdot 7 = 28$ and $28 \equiv_{12} 4$ since $28 = 2 \cdot 12 + 4$
Since $4 \cdot 7 \equiv_{12} 4,$ four 7-hour intervals must pass before the clock will read 4 o’clock.

You try

Find the smallest natural number $n$ such that $3n \equiv_{11} 5.$
Explain why there is no natural number $n$ that solves the congruence $4n \equiv_{10} 5.$

MORE TO COME!

10. Counting: Permutations and Combinations

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on March 17, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

Many problems can be solved by counting the number of possible outcomes when choosing $r$ elements from a set that contains $n$ elements, with no repetition allowed (that is, each element can only be counted once). To be clear, this assumes that $r$ and $n$ are natural numbers and $r \leq n.$

Some of these problems involve choosing an ordered sequence of $r$ elements while other problems involve choosing a subset of $r$ elements.

This chapter presents techniques for doing this type of counting. These techniques are built on the ones you studied in the Counting: Arithmetic Techniques chapter.

Key terms and concepts covered in this chapter:

Permutations and combinations
- Basic definitions
- Pascal’s identity
- The binomial theorem
binomials

10.1. The Factorial Of A Number

The factorial is a function defined for all natural numbers as described below.

Definition - The Factorial

Given a natural number $n,$ the function $n!$ is defined recursively as \[0! = 1\] \[ n! = n \cdot (n-1)! \]

That is, \[n! = n(n-1)(n-2) \cdots (2)(1)(1).\] This function is called the factorial of $n$ and also "$n-$factorial."

10.2. Permutations

A permutation of a set of elements is an ordered arrangement of the elements without repetition of elements. A permutation may involve every element of the set, or only some of the elements of the set. That is, a permutation is a sequence of elements from the set, where no element can appear more than once in the sequence.
Note: If you studied outside the USA, you may have learned that a permutation must involve every element of the set, and that the term variation is used when the arrangment involves only some (but not all) of the elements of the set. In this textbook, all of these are called permutations.

Consider the last four uppercase English letters $W, X, Y, Z$.

$WXYZ, XWYZ, WXZY$ are permutations of the letters taken four at a time
$XZW, WXY, WXZ$ are permutations of the letters taken three at a time
$WX, WY, ZX$ are permutations of the letters taken two at a time

Note: To be more formal, the sequences above could be written as tuples of the form $(W, X, Y, Z),$ $(X, W, Y, Z),$ $(W, X, Z, Y),$ and so on, but the notation used is simpler without the extra symbolic clutter of parentheses and commas.

Notice that no letter is repeated in a permutation.

How can you count all the possible permutations of a set? For small sets like the set of the last four uppercase English letters, you could list all the possible permutations of $W, X, Y, Z$ to determine that there are 24 permutations of the letters taken four at a time, 24 permutations of the letters taken three at a time, and 6 permutations of the letters taken three at a time. But what if you wanted to find all the possible permutations of all twenty six uppercase English letters $A, B, \ldots , W, X, Y, Z$? It would be very time consuming to write out all possible permutations of, say, the twenty six uppercase English letters taken twenty one at a time, let alone all the other possible permutations. We will develop a technique for doing this kind of counting now.

Definition

Given a set of $n$ elements, an ordered arrangment of $r \le n$ of the elements is called an $r$-permutation or a permutation of $n$ elements taken $r$ at a time.

The notation $P(n,r)$ represents the number of permutations of $n$ elements taken $r$ at a time. Note that $_nP_r$ is another commonly-used notation for this count.

Suppose you have a set that contains $n$ elements and want to construct a $2$-permutation of the elements. There are $n$ possible choices for the first element, and once that element is chosen, there are $n-1$ possible choices for the second element. The product rule lets you conclude that there are $n(n-1)$ ways to choose the two elements, taking into account that the order of the choices matters.

Now we can generalize this argument, informally: Suppose you have a set that contains $n$ elements and want to construct an $r$-permutation, where $r \le n.$ There are $n$ possible choices for the first element, $n-1$ possible choices for the second element, and so on, until we have $n-(r-1)$ possible choices for the $r\text{th}$ element. Apply the product rule repeatedly to conclude that there are $ n(n-1)\cdots (n-r+1)$ $r$-permutations of the $n$ elements.

The process in the previous paragraph can also be viewed recursively. Suppose you have a set that contains $n$ elements and want to construct an $r$-permutation, where $r \le n.$ There are $n$ possible choices for the first element, and the product rule let’s us conclude that the number of $r$-permutations of the $n$ elements, $P(n,r),$ is the product of $n$ and $P(n-1,r-1).$ Noticing that $P(n-r,0) = 1$ lets us draw the same conclusion as in the previous paragraph: There are $P(n,r) = n \cdot \left((n-1)\cdots (n-r+1) \right)$ permutations of $n$ elements taken $r$ at a time.

Theorem

For natural numbers $r$ and $n$ with $r \leq n,$ \[P(n,r) = n(n-1)(n-2) \cdots (n-r+1).\]

Example 8 - Permutations of 5 elements taken 3 at a time

How many lines of output will be printed by the following code?

You could trace through the code using the Next button to answer the question, but it would be tedious… try using the formula for $r$-permutations instead… and remember to count the final print statement after the loop, too.

Edit in PythonTutor

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Example 9 - Counting Permutations

The code below calculates the number of permutations given $n$ and $r$. Try to predict the variable names, values, and data types at different steps in the execution. Use the Next button to check your answers.

Edit in PythonTutor

Question

How many permutations are there of the twenty six uppercase English letters taken twenty one at a time?

Hint

Edit the code so that it will compute ProdCount(26,21), a relatively "small" natural number… "small" meaning that it’s decimal expansion has only 25 digts! 😎

Here is another way to think about counting the number of permutations of $n$ elements taken $r$ at a time: Imagine writing down all possible permutations of $n$ elements taken $n$ at a time, that is all possible ordered arrangements of all $n$ elements. Notice that if we only care about the first $r$ elements in the list, then there are $n-r$ elements at the end of the list that we can rearrange in any of $(n-r)!$ ways without changing the order of the the first $r$ elements. Now apply the division rule from the Counting: Arithmetic Techniques chapter: We have a procedure, ordering all $n$ elements, that can be completed in $n!$ possible ways, but for each way of completing this procedure there are $(n-r)!$ possible ways with the same outcome for ordering the first $r$ elements. The division rule lets you conclude that there are $\frac{n!}{(n-r)!}$ ways to order the first $r$ elements.

Theorem

For natural numbers $r$ and $n$ with $r \leq n,$ \[P(n,r) = \displaystyle \frac{n!}{(n-r)!}\]

Video Example

The following video example features Dr. Joshua Roberts, Associate Professor of Mathematics at Georgia Gwinnett College.

Example 10

If there are 10 runners in a race, how many different ways can the gold, silver, and bronze medals be awarded?

Solution

There are 10 elements (the runners) and we are choosing 3 to win medals. So the number of ways they can be awarded is

$P(10,3) = \displaystyle \frac{10!}{(10-3)!} = \frac{10!}{7!} = 10 \times 9 \times 8 = 720$

Alternatively, we could have used the product rule and noticed that there are 10 ways to award the gold, then there are 9 ways to award the silver, then 8 ways to award the brozne.

You Try

From a group of 100 workers in a union, how many ways can there be a union President, Vice President, and Treasurer?

Example 11

How many permutations of the digits 0123456789 are there that contain the string 456?

Solution

We can regard this as a permutation of 8 elements: the string "456" and the other 7 individual digits. These 8 elements can occur in any order, so the number of permutations is

$P(8,8) = 8! = 40320.$

Challenge

How many different six-letter words (including nonsense words), can be formed using the letters in "HEROIC" where the vowels must appear together and no letters are repeated?

Hint

You can regard this a permutation of the 3 consonants and a string of 3 consecutive vowels… but notice that the order of the vowels matters.

Example 12 - Python functions for Factorial and Counting Permutations

This code sample illustrates how to use functions in Python’s math module to compute $n!$ and $_nP_r.$

Edit in PythonTutor

10.3. Combinations

A selection of elements from a set of $n$ elements where order of selection does not matter is called a combination. Notice that each combination corresponds to a subset of the set of $n$ elements.

Consider the letters $W, X, Y, Z$ and choose three at a time where the order does not matter. In this case we are selecting a subset of size three instead of a sequence of length three. The sets $\{ W, X, Y \}$ and $\{ X, Y, W \}$ are just two ways of describing the same set since the order in which elements of a set are listed does not matter.

There are four possible ways to choose three letters without regard to order: $\{ W, X, Y \}$, $\{ W, X, Z \}$, $\{ W, Y, Z \}$, and $\{ X, Y, Z \}$.
There are six possible ways to choose three letters without regard to order: $\{ W, X \}$, $\{ W, Y \}$, $\{ W, Z \}$, $\{ X, Y \}$, $\{ X, Z \}$, and $\{ Y, Z \}$.
There are four possible ways to choose one letter (without regard to order): $\{ Z \}$, $\{ Y \}$, $\{ X \}$, and $\{ W \}$. (Keep reading to find out why this "reversed" ordering of the subsets was used.)

This shows that there are 4 combinations of 4 elements taken 3 at a time, 6 combinations of 4 elements taken 2 at a time, and 4 combinations of 4 elements taken 1 at a time.

Definition

Given a set of $n$ elements, an unordered selection of $r \le n$ of the elements is called an $r$-combination or a combination of $n$ elements taken $r$ at a time.

The notation $C(n,r)$ represents the number of combinations of $n$ elements taken $r$ at a time. Note that $_nC_r$ is another commonly-used notation for this count, as is the binomial coefficient $n\choose r$. Any of these notations can read as "$n$ choose $r.$"

Next, notice that every $r$-combination corresponds to $P(r, r) = r!$ different $r$-permutations. That is, to select a $r$-permutation we could instead first select a $r$-combination without regard to order and then order the $r$ elements.

As an example, suppose we have $n$ elements and want to construct a $3$-permutation. We could instead first choose a $3$-combination of the elements. There are $C(n,3)$ possible $3$-combinations. Once we have chosen a specific $3$-combination, we can reorder the 3 elements in $P(3,3) = 3!$ ways. This argument shows that $P(n,3) = C(n,3) \cdot P(3,3) = C(n,3) \cdot 3!$, so $C(n,3) = \displaystyle \frac{P(n,3)}{3!}$.

We can generalize this argument for each natural number $r \leq n$ to arrive at the next theoren.

Theorem

$C(n,r) = \displaystyle \frac{P(n,r)}{r!} = \frac{n!}{r!(n-r)!}$

Example 13 - Combinations of 5 elements taken 3 at a time

The code below calculates the number of and lists combinations given $n$ and $r$.

How many lines of output will be printed by the following code? Remember to count the final print statement after the loop, too.

Edit in PythonTutor

You try

Edit the code to list and count combinations of 5 choose 4.

Video Example

Example 14

How many ways can five cards be dealt from a standard 52-card deck?

Solution

We are choosing 5 cards from 52 cards and the order does not matter, so $C(52,5)=\displaystyle \frac{52!}{5!47!}$

Example 15

How many bit strings of length $n$ contain exactly $r$ 0s?

Solution

Choosing the positions of the $r$ 0s corresponds to the $r$-combinations of the set $\{1, 2, 3, \dots, n\}$. Thus there are exactly $C(n,r)$ such bit strings.

10.3.1. Properties Of Combinations

In this subsection you will learn about some properties of $C(n, r)$ and a famous "number triangle;" you will see that the values listed in the triangle coincide with the values of $C(n,r).$

Pascal’s Triangle

Consider the following number triangle. \[{\displaystyle {\begin{array}{c}1\\1\quad 1\\1\quad 2\quad 1\\1\quad 3\quad 3\quad 1\\1\quad 4\quad 6\quad 4\quad 1\\1\quad 5\quad 10\quad 10\quad 5\quad 1\\1\quad 6\quad 15\quad 20\quad 15\quad 6\quad 1\\1\quad 7\quad 21\quad 35\quad 35\quad 21\quad 7\quad 1\end{array}}}\]

We refer to the top row of this triangle as "row 0," and the left side of the triangle as "column 0" (so the columns are actually drawn diagonally.) Notice that each number that is in row 2 or lower (and not on one of the sides of the triangle) is the sum of the two numbers directly above it in the triangle. For example, in row 5, column 2 row, the number 10 is the sum of the numbers in row 4, column 1 (the number 4) and row 4, column 2 (the number 6.)

This number triangle is often called "Pascal’s Triangle" after the French mathematician Blaise Pascal who wrote about the triangle in the mid-1600’s A.D.. However, the number triangle was known for centuries before Pascal lived. You may also want to see the "History" section of this wikipedia pagefor additional information.

RECOMMENDATION: The "Binomial" activity can replace the rest of this subsection.

The Numbers in the Triangle Are The Values Of $C(n,r)$

Notice that $C(0,0),$ $C(1,0),$ and $C(1,1)$ are all equal to 1. These numbers match the values in row 0 and row 1 of the triangle.

Now, consider an alternative way we can compute $C(n+1,r).$ Imagine we have a set containing $n+1$ elements, where one of the elements is "special" - in how many ways can we choose $r$ of the elements? There are two cases: We can choose the "special" element and then choose $r-1$ other elements from the remaining $n$ elements, or we can ignore the "special" element and choose $r$ elements from the remaining $n$ elements.

As a specific example, suppose we want to choose 2 elements form the set of letters $\{ J, K, L, M, N\}$. We could treat $J$ as a special element. In the first case, we would always choose $J$ and then choose 1 of the 4 remaining letters; there are $C(4,1)$ ways to do this. In the second case, we never choose $J$, and must choose two other letters; there are $C(4,2)$ ways to do this. In all, there are $C(5,2)$ ways to choose 2 letters from the set, and the sum rule from the Counting: Arithmetic Techniques chapter can be applied to show that $C(5,2)$ must be equal to $C(4,1) + C(4,2).$

In the general case of combinations of $n+1$ elements taken $r$ at a time, we have the following theorem.

Theorem - Pascal’s identity

For natural numbers $n$ and $r$ such that $r \leq n,$ \[C(n+1,r) = C(n, r-1) + C(n,r)\]

This theorem shows that the numbers in each row of the triangle are the same numbers we can compute as $C(n,r).$ That is, since row 0 and row 1 contain the values for $C(0,0),$ $C(1,0),$ and $C(1,1),$ row 2 must contain the values for $C(2,0),$ $C(2,1),$ and $C(2,2),$ and row 3 must contain the values for $C(3,0),$ $C(3,1),$ $C(3,2),$ and $C(3,3).$ We can continue this pattern for all rows of the triangle. This is an informal proof that the number triangle is made up of the values of $C(n,r)$ for all natural numbers $r$ and $n$ with $r \leq n.$

$C(n,r) = C(n,n-r)$

You may recall from an earlier example that there are $C(4,3) = 4$ possible ways to choose three letters from the set $\{ W, X, Y, Z \}$ and that there are $C(4,1) = 4$ possible ways to choose one letter from the set $\{ W, X, Y, Z \}.$ The equation $C(4,3) = C(4,1)$ does not "just happen to be true" but in fact must be true: Each combination of 4 elements taken 3 at a time corresponds to a combination of 4 elements taken 1 at a time, and vice versa - we can choose 3 of the 4 elements by "throwing out" the 1 element we don’t want to keep. That is, we choose the 3-element subset we care about indirectly by instead choosing the 1-element we do not care about. There is a one-to-one correspondence between the subsets that contain 3 letters and the subsets that contain 1 letter: \[\{ W, X, Y \} \text{ corresponds to } \{ Z \}\] \[\{ W, X, Z \} \text{ corresponds to } \{ Y \}\] \[\{ W, Y, Z \} \text{ corresponds to } \{ X \}\] \[\{ X, Y, Z \} \text{ corresponds to } \{ W \}\]

In general, there is always a one-to-one correspondence between the combinations of $n$ elements taken $r$ at a time and the combinations of $n$ elements taken $n-r$ at a time: Choosing $r$ elements for a subset corresponds to choosing the $n-r$ elements to leave out of the subset. This is an informal proof of the following theorem.

Theorem

For natural numbers $n$ and $r$ such that $r \leq n,$ \[C(n,r) = C(n,n-r)\]

Alternatively, this theorem can be proven algebraically using the formula $C(n,r) = \frac{n!}{r!(n-r)!}.$

10.4. The Binomial Theorem

An algebraic expression that is the product of a number and power of zero or more variables is called a term. Two terms are called like terms if the two terms have the exact same variables and those variables appear with the exact same exponents. For example, $a,$ $ab,$ $5a,$ and $3a^{2}$ are terms, and $a$ and $5a$ are like terms. Like terms can be added to make a new term, for example, $a + 5a$ is $6a$ where we’ve used the distributive property of multiplication over addition to write \[ a + 5a = 1a + 5a = (1+5)a = 6a \]

Now consider the product of two algebraic expressions $a+b$ and $x+y$ where the variables represent real numbers. We can rewrite $(a+b)(x+y)$ in an expanded form by using the distributive property of multiplication over addition: \[ (a+b)(x+y) = a(x+y) + b(x+y) = ax + ay + bx + by. \]

Another way to view this multiplication is as follows: You need to sum all the possible products you can form by choosing a first factor from the set $\{ a, \, b \}$ and a second factor from the set $\{ x, \, y \}.$ Apply the multiplication rule to compute that there must be $(2)(2) = 4$ possible products, namely, $ax,$ $ay,$ $bx,$ and $by.$ In this way, we can calculate the same result, \[ (a+b)(x+y) = ax + ay + bx + by. \]

In case both factors are $(a+b)$, we get \[ (a+b)(a+b) = a(a+b) + b(a+b) = aa + ab + ba + bb = a^{2} + 2ab + b^{2}. \]

Notice that the coefficients can be thought of as the number of ways to choose $b$ for each term during the multiplication: \[ a^{2} + 2ab + b^{2} = C(2,0)a^{2} + C(2,1)ab + C(2,2)b^{2} = {2\choose0}a^{2} + {2\choose1}ab + {2\choose2}b^{2}. \]

Look at the next highest powers:

\begin{equation} \begin{aligned} (a+b) \left( a^{2} + 2ab + b^{2} \right) {} & = a \left( a^{2} + 2ab + b^{2} \right) + b \left( a^{2} + 2ab + b^{2} \right) \\ & = a^{3} + 2a^{2} b + ab^{2} + a^{2} b + 2ab^{2} + b^{3} \\ & = (1)a^{3} + (2+1) a^{2} b + (1+2) ab^{2} +(1) b^{3} \\ & = a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \end{aligned} \end{equation}

\begin{equation} \begin{aligned} (a+b) \left( a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \right) {} & = a \left( a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \right) + b \left( a^{3} + 3 a^{2} b + 3 ab^{2} + b^{3} \right) \\ & = a^{4} + 3 a^{3} b + 3 a^{2} b^{2} + ab^{3} + a^{3} b + 3 a^{2} b^{2} + 3 ab^{3} + b^{4} \\ & = (1)a^{4} + (3 + 1) a^{3} b + (3 + 3) a^{2} b^{2} + (1 + 3) ab^{3} + (1)b^{4} \\ & = a^{4} + 4 a^{3} b + 6 a^{2} b^{2} + 4 ab^{3} + b^{4} \end{aligned} \end{equation}

Notice that the coefficients in the algebraic expansions above are the same numbers that appear in Pascal’s arithmetic triangle.

We will prove in the Proofs: Mathematical Induction chapter that \[ (a+b)^{n} = \sum\limits_{i=1}^{n} {n\choose i} a^{n-i} b^{i} \] for every natural number $n.$

10.5. Exercises

List all the permutations of $\{1, 2, 3\}$.
How many permutations are there of the set $\{1, a, 2, b, 3, c, 5\}$?
Let $A=\{a, b, c, d\}$
1. List all the 3-permutations of $A$.
2. List all the 3-combinations of $A$.
Let $A=\{a, b, c, d, e\}$
1. List all the 2-permutations of $A$.
2. List all the 2-combinations of $A$.
Find the value of the following
1. $P(5,2)$
2. $P(10,8)$
3. $P(14,10)$
4. $P(12,8)$
5. $C(5,2)$
6. $C(10,8)$
7. $C(14,10)$
8. $C(12,8)$
How many bit strings of length 10 contain:
1. Exactly five 1s?
2. At most five 1s?
3. At least four 1s?
4. The same number of 0s and 1s?
How many permutations of the digits $12345678$ contain:
1. The string 284?
2. The string 3581?
3. The string 21 and 57?
How many ways are there to choose 9 cards from a standard 52 card deck?
How many ways can you be dealt a pair in a 5 card hand (2 cards of the same rank and 3 cards of a different rank)?
How many ways can you be dealt a full house in a 5 card hand (2 cards of the same rank and 3 cards of the same rank)?
How many license plates consist of 4 letters followed by 3 digits if:
1. Repetition is allowed?
2. Repetition is not allowed?
Using $C(n,r) = \displaystyle \frac{n!}{r!(n-r)!}$, evaluate the terms of this triangular table. Will you need the formula to extend the table to more rows?

\begin{array}{ccccccccccccc} &&&&&&&C(0,0)&&&&&&\\ &&&&&& C(1,0) && C(1,1) &&&&&\\ &&&&& C(2,0) && C(2,1) && C(2,2) &&&&\\ &&&& C(3,0) && C(3,1) && C(3,2) && C(3,3) &&&\\ &&& C(4,0) && C(4,1) && C(4,2) && C(4,3) && C(4,4) &&\\ \end{array}

11. Proofs: Mathematical Induction

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on November 3, 2025.
inserted proof of correctness of the MergeSort algorithm using strong induction
Earlier updates include:
inserted additional sample proof (Example 7: $11^{n}-6$ is divisible by $5$ for all natural numbers $n.$)
linked setup of proof of Binomial Theorem to earlier discussion in the Counting: Permutations and Combinations chapter
added new “coder-friendly” version of example in the first subsection (sum of consecutive odd positive integers)
added additional “setting up a proof” exercises (complete graphs, binomial theorem)
added note about the Principle of Mathematical Induction and the Well-Ordering Principle
removed references to universal generalization throughout.
restructured discussion of proof by induction and the Principle of Mathematical Induction

In this chapter, you will learn how to use the mathematical induction proof technique to create a single proof of infinitely many different but related propositions. This proof technique will also be used to validate algorithms.

Key terms and concepts covered in this chapter:

Mathematical induction
- Weak and strong induction (i.e., First and Second Principle of Induction)
- examples of mathematical induction
Well-ordering and Induction
Structural induction

11.1. Why Is Mathematical Induction Needed?

We often encounter an infinite sequence of related propositions, all of which appear to be True. To prove all of the propositions, we might try to write a proof for each individual proposition and then combine all those proofs together as a single infinitely-long “proof”… but recall from the chapter on Proofs: Basic Techniques that a proof consists of a finite sequence of propositions of finite length.

As an analogy, imagine you had an “algorithm” that required infinitely-many steps to complete its task… such an algorithm would not be useful in the real world since it might never complete the task!

Example 1 - What is this code computing?

The Python code below computes the sum of consecutive odd positive integers, starting at 1, and stops at the first sum that is greater than or equal to too_small which was initialized to the value 10. You should trace through the code to see how it works.

Edit in PythonTutor

You may notice that each number that is printed is a perfect square integer. Will this be True no matter how many values we tell the code to print? Try editing the code and changing the initialization value of too_small to a larger integer such as 100 to see that this pattern persists.

To better analyze this code, let’s insert a variable the_counter which we’ll use to count how many odd positive integers have been added to the_sum so far.

Edit in PythonTutor

Notice that each print statement now seems to display a positive integer and its square… but we are not directly computing that square! Again, you can edit this code to change the value that too_small is initialized to to a larger integer such as 100 or even 10000 to see that this pattern persists.

To make it more clear, let’s edit the print statements so that the square of the_counter is displayed along with the other values. Also, since the_counter appears to be such an important variable, let’s replace the loop condition with one that involves that variable (so too_small now is used as a bound on how many consecutive odd positive integers we’ve added instead of a bound on the cumulative sum.)

Edit in PythonTutor

At this point, if may seem “obvious” that this code is showing us that the sum of the first $n$ consecutive odd positive integers is always equal to $n^{2}$ (where $n$ can be any positive integer,) but notice that we can only use the code to show this for as many values as we can actually compute! The same issue arises here as in the brute‑force method for finding the least element of a nonempty subset of the natural numbers that was discussed after introducing the Well-Ordering Principle for $\mathbb{N}$ — we cannot actually run the code for very large values of too_small like $10^{10^{10^{10^{10}}}}$ because we don’t have sufficient computing resources (including time and space) to do so.

In a “mathematically ideal” universe we would simply change the loop to use while True: and let the code run forever, then just check that each printed line displays the_counter followed by two copies of its square. However, since we are human and have limited computing resources (including time and space) we will need another way of justifying that every value of the_sum will be a square without running an infinite loop. The approach we’ll take is to justify that no matter how many times we go through the loop, the value of the_sum will be the square of the_counter when we reach the end of the block of code within the loop as long as that was also True for the previous iteration of the loop (and was also True at the end of the initialization block, before ever entering the loop.) This technique is, essentially, “proof by mathematical induction.”
There are other approaches you could use to justify that the_sum will be the square of the_counter in this particular algorithm, but the goal of this example is to try to explain why we need the more general technique of mathematical induction to validate algorithms.

Now let’s look at the same phenomenon using a mathematician’s point of view.

Example 2 - Why Use Mathematical Induction?

Let’s examine a conjecture about a certain type of polygonal number, namely, square numbers.

Consider the image that shows 16 colored disks arranged as a square. By starting at the lower left corner of the square and grouping disks of the same color, we can count the total number of disks in the figure as follows. \begin{equation} \begin{aligned} 1 {} & = 1^{2} \\ 1 + 3 {} & = 4 = 2^{2} \\ 1 + 3 + 5 {} & = 9 = 3^{2} \\ 1 + 3 + 5 + 7 {} & = 16 = 4^{2} \\ \end{aligned} \end{equation}

Notice that the sum of the odd integers on the left-hand side of each equation is equal to the square of the number of odd integers on the left-hand side of the equation. This means that the predicate \[P(n)\text{: "The sum of the first } n \text{ positive odd integers is equal to } n^{2} \text{."}\] is a True statement for the natural numbers $n \in \{ 1, 2, 3, 4 \},$ that is, each of the following four propositions is True:

$P(1)$: "The sum of the first $1$ positive odd integers is equal to $1^{2}$." (I know this is not proper English, but please bear with me!)
$P(2)$: "The sum of the first $2$ positive odd integers is equal to $2^{2}$."
$P(3)$: "The sum of the first $3$ positive odd integers is equal to $3^{2}$."
$P(4)$: "The sum of the first $4$ positive odd integers is equal to $4^{2}$."

In the following code snippet, function $P$ implements the predicate used in this example. Recall that the predicate’s output is a proposition - the output is just a string of symbols and does not indicate whether the proposition is True or False.

Edit in PythonTutor

Next, consider this second image, which shows how you could change the first image to one that can be used to prove that $P(5)$ is True.
It’s easy to show that the proposition $P(5)$ is True because you can just write out and verify that $1 + 3 + 5 + 7 + 9 = 25 = 5^{2}.$ The goal here is to relate the truth value of the proposition $P(5)$ to the truth value of the “preceding proposition” $P(4),$ in a way similar to how the numerical value of a term of a sequence can be related to the numerical value of the “preceding term” of the sequence using a recurrence relation.

Notice that the second image shows that, given the square arrangement of disks that has 4 disks along each side, we can construct a square arrangement that has 5 disks along each side by inserting 4 new disks above the top row, 4 new disks in a column to the right of the rightmost column, and 1 new disk in the upper right corner to complete the square arrangement.

In fact, there is nothing special about the number 4 in the previous paragraph: If $k$ is any positive natural number and we have a $k \times k$ square arrangement of disks, we can enlarge it to a $(k+1) \times (k+1)$ square arrangement of disks by inserting $k$ disks above, $k$ disks to the right, and $1$ disk in the upper right corner of the $k \times k$ square to complete the $(k+1) \times (k+1)$ square. Algebraically, we can account for the total number of disks in the $(k+1) \times (k+1)$ square by writing \[ k^{2} + k + k + 1 = (k+1)^{2} \] which is True for any natural number $k$ (Just simplify the left-hand side and expand the right-hand side of the equation to see that the equation must be True.)

Based on this second image, we can make a conjecture that $P(n)$ must be True for every positive natural number $n.$ We now need to prove this conjecture.

Notice that if we combine the propositions

$P(1)$ and $P(2)$ and $P(3)$ and $P(4)$
For all $k \in \mathbb{N},$ $P(k)$ implies $P(k+1).$

then we can build a proof for all the integers up to and including any value of $n \in \mathbb{N}$ that we want.

For example, to prove $P(1,\!000,\!000),$ we could start by asserting that $P(1)$ is True, then apply the conditional $P(k) \rightarrow P(k+1)$ along with the rule of inference modus ponens $999,\!999$ times to prove that $P(1,\!000,\!000)$ is True. This proof is finite - I never claimed that the proof would be short!

As an analogy, think of repeatedly applying modus ponens to the conditional as using a loop in code. We are just repeating the same argument $( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)$ over and over again as the value of the variable $k$ is incremented by 1 at the end of each loop iteration until we reach the value that we want to stop at. In the following code snippet, the user-defined function addTheOdds implements the computations used in this example’s argument. Notice how the Basis Step corresponds to validating the loop initialization and how the Induction Step corresponds to validating the output at the end of each loop iteration (assuming that the values were correct at the start of that loop iteration.) You can change the value of $n$ in the code to confirm the truth value of $P(n)$ for any integer you’d like, assuming you have enough time and computing resources.

Edit in PythonTutor

Since we can now, in principle, build a proof of $P(n)$ for any value of $n \in \mathbb{N}$ that we could choose, we conclude that $(\forall \in \mathbb{N})P(n).$ That is, we have proven the proposition \[ \text{"For every positive natural number } n \text{, the sum of the first } n \text{ positive odd integers is equal to } n^{2} \text{."}\] Another way to look at this is to define $s(n)$ to be the number of disks in a $n \times n$ square arrangement of disks. We have shown that $s(1) = 1$ and that $s(k+1) = s(k) + 2k + 1$ for every positive natural number value of $k,$ and have concluded that $s(n) = n^{2}$ for every positive natural number $n.$

We will rewrite the above example more formally (with full algebraic detail) later in the chapter.

11.2. The “Proof By Mathematical Induction” Technique

A proof by mathematical induction of a predicate $P(n)$ defined for natural numbers $n \in \mathbb{N}$ consists of three steps.

Basis Step	Prove the predicate $P(n)$ is True for some small value of $n.$ In most but not all cases, you prove either $P(0)$ or $P(1).$ You can also prove $P(n)$ for finitely-many other values if it helps you get a feel for what needs to be proven, as was done in “the sum of the first $n$ consecutive odd natural numbers is the perfect square $n^{2}$” in the previous section.
Induction Step	Prove that the conditional statement $P(k) \rightarrow P(k+1)$ is True for any integer $k.$ In this context, the predicate $P(k)$ is called the induction hypothesis and is assumed to be True, where $k$ represents an arbitrary natural number. By assuming that $P(k)$ is already True and also proving the conditional statement $P(k) \rightarrow P(k+1),$ you can use modus ponens to infer that $P(k+1)$ must also be True: You can “step up” from any natural number to the next largest natural number and maintain the truth value of the predicate.
Conclusion Step	Conclude that $P(n)$ is True for all natural numbers $n$ that are greater than or equal to the smallest value used in the Basis Step. Some sources do not list the Conclusion Step as part of a proof by mathematical induction, but the Remix includes it to emphasize that this step must be done to complete the proof.

You can compare the first two steps of a proof by mathematical induction to the two steps used in a recursive definition as in the Sequences and Recursion chapter. Note that a recursive defintion is used to describe and define a process for constructing objects or a set of objects or a structure, but a proof by mathematical induction is used to justify and validate such a process.

Note that each of the three steps will be a proof of finite length, but will allow us to conclude that $P(n)$ is true for every natural number $n$ greater than or equal to some some small natural number $b \geq 0$.

As an analogy, imagine we are building a tower using interlocking toy blocks. How tall can the tower be? The basis step involves placing a foundation on the ground (either a flat surface for $n = 0$, or a first block for $n = 1$), and the induction step justifies that if we have built a tower that has height $k$ then we can build a tower of height $k+1$ by placing one more block on the tower. The conclusion step states that we can build a tower that is of any finite height $n$ (as long as we have enough blocks and ignore issues arising from real-world physics!) Note that we never build an infinitely tall tower.
Note: Some textbooks and sources use an "infinite ladder" analogy for mathematical induction, but this is not quite correct. A better analogy is a ladder that can be extended to any finite height you need, but that is always of finite height.

Let’s finish this section with an example of using this proof technique. Here is a proof of the Handshake Theorem for graphs.

Example 3 - Proof of the Handshake Theorem

We will prove the following proposition using proof by mathematical induction.

Theorem

If $G$ is a graph with vertex set $V$ and edge set $E,$ where both $V$ and $E$ are finite sets, then the sum of the degrees of all vertices in $V$ is equal to 2 times the number of edges in $E.$

Notice first that since this needs to proven for any graph with any finite number of vertices and edges, it is a good candidate for proof by mathematical induction.

Which number should be one we use for induction?

We could try using induction on the number of vertices, but notice that we can add an isolated vertex without affecting either the sum of the degrees or the number of edges. This indicates that the number of vertices is not the correct variable to use for a proof by induction.
We could try using induction on the sum of the degrees of the vertices, but we’d have to figure out how to add 1 to that sum… but notice that, as above, adding a new vertex $v$ to the graph
- either leaves the sum of degrees unchanged (if $v$ is isolated)
- or changes the sum of degrees by 2 (because either $v$ is an endpoint of a loop or $v$ comes along with a new edge that connects to another vertex of the graph.) Since we cannot meaningfully add 1 to the sum of the degrees, this is also not the correct variable to use for induction.
We could try using induction on the number of edges. This could work since adding a new edge will increase the sum of degrees of the vertices by 2 (whether the vertices are "new" or "old").

So, to prove the theorem, let $n$ represent the number of edges in a graph and let $P(n)$ be the predicate \[P(n)\text{: "The sum of the degrees of the vertices for a graph with } n \text{ edges is equal to } 2 \cdot n \text{."}\] We will prove that $(\forall n \in \mathbb{N})P(n).$

Basis Step: $P(0)$ is the proposition "The sum of the degrees of the vertices for a graph with $0$ edges is equal to $2 \cdot 0.$" Notice that a graph with 0 edges is just a collection of isolated vertices, and each of the isolated vertices has degree 0, so the sum of the degrees of the vertices must also be 0, which is 2 times the number of edges. This means that $P(0)$ is True, and the Basis has been established.

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$.

Secondly, we will prove that the conditional $P(k) \rightarrow P(k+1)$ must be True, which means we can use modus ponens (or the equivalent tautology $( P(k) \land ( P(k) \rightarrow P(k+1) ) ) \rightarrow P(k+1)$) to show that $P(k+1)$ is also True.

For the natural number $k,$ the predicate $P(k)$ is the proposition "The sum of the degrees of the vertices for a graph with $k$ edges is equal to $2 \cdot k.$

Suppose that we have a graph with $k$ edges. Recall that we are assuming that the Induction Hypothesis $P(k)$ is True, so the sum of the degrees of the vertices is equal to $2 \cdot k.$

Next, we insert one new edge $e$ into the graph. There are, essentially, two possible cases to consider.

$e$ has two different endpoints, in which case the degree of each endpoint increases by 1 when $e$ is added to the graph, so the sum of the degrees of the vertices increases by 2, or ,
$e$ is a loop with only one endpoint, in which case the degree of that endpoint increases by 2, so the sum of the degrees of the vertices increases by 2.
Notice that in either bullet, it does not matter whether the new edge $e$ has endpoints that are "new" to the graph or were "old" vertices that were in the graph already.

Notice that any graph with $k+1$ edges can be built up this way from a graph that had $k$ edges (or, if it is easier to think of, every graph with $k+1$ edges can be changed to one with only $k$ edges by temporarily removing one edge.) So, using the fact that $2 \cdot k + 2 = 2 \cdot (k+1),$ we have proven that "The sum of the degrees of the vertices for a graph with $k+1$ edges is equal to $2 \cdot (k + 1).$ That is, we have proven that $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that $(\forall n \in \mathbb{N})P(n),$ which translates to "For all natural numbers $n,$ the sum of the degrees of the vertices for a graph with $n$ edges is equal to $2 \cdot n.$"

Q.E.D.

11.3. The Principle of Mathematical Induction

The “proof by mathematical induction” technique is justified by the Principle of Mathematical Induction, often abbreviated as PMI.

The Principle of Mathematical Induction

Suppose that $S$ is a subset of $\mathbb{N}$ such that

$0$ is an element of $S,$ and
if $k$ is an element of $S,$ then $k+1$ is an element of $S,$ too.

Then every natural number is contained in $S,$ that is, $S = \mathbb{N}.$

In the Remix, the Principle of Mathematical Induction is treated as an axiom, a statement that is assumed to be True about the set of natural numbers.
This principle has been used for nearly 2,000 years, but was only formalized by mathematicians about 150 years ago. See the History section, as well as the References, at this Wikipedia page for more information.

Note to instructors - click to expand

The Principle of Mathematical Induction may seem like an “obviously True” statement. If 0 is in a set, and if for each natural number in the set the next biggest natural number is also in the set, then it would seem that you can list all the elements of the set by simply counting up by 1, starting at 0. The issue that arises is that you can never stop counting up because there are infinitely many natural numbers! Instead, mathematicians can state that they are assuming that the Principle of Mathematical Induction is True without further justification.

As another example of why PMI is assumed to be True, consider the predicate that was used in the first example of this chapter: \[P(k)\text{: "The sum of the first } k \text{ positive odd integers is equal to } k^{2} \text{."}\]

What a mathematician would really like to do is to run an infinite loop, like the one shown in the code listing below, to verify that the predicate is True for every natural number greater than or equal to 1.

def P(k):
    return f"\"The sum of the first {k} positive odd integers is {k*k}\""

def addTheOddsForever():
    sum = 1
    k = 1
    if sum != k*k:
        print(P(k),"is a False proposition.")
        return False
    while True: # loop forever...
        sum = sum + 2*k + 1
        k = k + 1
        if sum != k*k:
            print(P(k),"is a False proposition.")
            return False
    return True

if addTheOddsForever():
    print("P(n) is a True proposition for each every natural number n.")

When addTheOddsForever() returns True, the code will have verified the predicate for every positive natural number… but when will addTheOddsForever() ever return True? We only exit the loop if the predicate is False for some value of k (that is, if we find a counterexample) but otherwise we will never exit the loop to return True! We will have to wait “forever” for the answer, which dows not help us humans much — we would run out of memory or processing power or time in the universe before that.

Since the code with the infinite loop cannot verify the predicate for all natural numbers greater than or equal to 1, the mathematician will settle for using the following code sample instead, as long as the variable stop_value can be made larger and larger, with no upper bound on how large it can be (Again, this won’t really work due to real-world constraints, but at least it does avoid the infinite loop.)

Edit in PythonTutor

This code can verify the predicate for all natural numbers starting from 1 up to and including stop_value. The Principle of Mathematical Induction is stating that you can justify that the predicate $P(k)$ is True for all positive natural numbers by instead verifying that addTheOddsRevised(stop_value) will return True for every possible positive natural number value you could assign to stop_value.

11.4. More Example Proofs Using Mathematical Induction

Example 4 - An Algebraic Expression for Positive Odd Integers

Let $P(n)$ be the predicate \[P(n)\text{: "The } n\text{th positive odd integer is equal to } 2n-1 \text{."}\] We will prove that $(\forall n \in \mathbb{N}_{>0})P(n),$ that is, for all positive integers $n.$
Notice that you can find evidence for the conjecture that $k\text{th}$ positive odd integer is $2k - 1$ by making a table and finding an algebraic formula that matches the table. See this appendix if you don’t remember how to do this.

Basis Step: $P(1)$ is the proposition "The $1$th positive odd integer is equal to $2(1)-1.$" This is True since $2(1)-1 = 1$, in spite of the poor English, which should use "$1$st" instead of "$1$th."

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$. That is, we assume that "The $k$th positive odd integer is equal to $2k-1$" is True for some positive natural number $k.$

If the $k$th positive odd integer is equal to $2k-1,$ then the $(k+1)$th positive odd integer is obtained by adding $2$ to $2k-1,$ that is the $(k+1)$th positive odd integer is equal to $(2k-1) + 2,$ which can be rewritten using algebra as \begin{equation} \begin{aligned} (2k-1) + 2 {} & = 2k - 1 + 2 \\ & = 2k + 2 - 1 \\ & = 2(k+1) - 1 \end{aligned} \end{equation}

We have proven that if $k$th positive odd integer is equal to $2k-1$ then the $(k+1)$th positive odd integer is equal to $2(k+1)-1.$ That is, we have proven $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all positive integers $n,$ the $n$th positive odd integer is equal to $2n-1.$

Q.E.D.

The next example proof is the formal proof that corresponds to the first example of this chapter.

Example 5 - Square Numbers

Let $P(n)$ be the predicate \[P(n)\text{: "The sum of the first } n \text{ positive odd integers is equal to } n^{2} \text{."}\] We will prove that $(\forall n \in \mathbb{N}_{>0})P(n).$

Basis Step: $P(1)$ is the proposition "The sum of the first $1$ positive odd integers is equal to $1^{2}.$" If we allow a sum to have only one addend, then $P(1)$ is True since $1 = 1^{2}$, so $P(1)$ can be used as the basis of our induction proof. If we want to be sure that there are least two addends in the sum, we can use $P(2)$ as an additional basis. $P(2)$ is the proposition "The sum of the first $2$ positive odd integers is equal to $2^{2}$" which is True because $1 + 3 = 2^{2}.$

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$.

Based on the formula for the $k\text{th}$ positive odd integer, we can rewrite the sum of the first $k$ positive odd integers $1 + 3+ \ldots + (2k-1)$ using summation notation as $\sum\limits_{i=1}^{k}(2i-1)$ and rewrite the predicate $P(k)$ in algebraic form as either \[ P(k): 1 + 3+ \ldots + (2k-1) = k^{2}\] or \[ P(k): \sum\limits_{i=1}^{k}(2i-1) = k^{2}.\] Note that $P(k)$ is still a predicate - it is stating that a certain equation holds.
Note: A common error is to treat a predicate like $P(k)$ that is written using algebra notation as a function that gives numbers as outputs, but this is incorrect. $P(k)$ is a predicate that gives propositions as outputs. Recall, as mentioned earlier in the Remix, that you can think of this like a programmer: $P(k)$ returns a string, not a number and not a Boolean.

We can now prove that the conditional $P(k) \rightarrow P(k+1)$ must be True using algebra. \begin{equation} \begin{aligned} \sum\limits_{i=1}^{k+1}(2i-1) {} & = \left(\sum\limits_{i=1}^{k}(2i-1)\right) + (2(k+1)-1) \\ & = k^{2} + (2(k+1)-1) \end{aligned} \end{equation} where we have substituted $k^{2}$ for the sum $\sum\limits_{i=1}^{k}(2i-1)$ based on the induction hypothesis.

Now we simplify this using algebra and show that it is the same as the right hand side. \begin{equation} \begin{aligned} \sum\limits_{i=1}^{k+1}(2i-1) {} & = k^{2} + (2(k+1)-1) \\ & = k^{2} + 2k + 2 - 1 \\ & = k^{2} + 2k + 1 \\ & = (k + 1)^{2}\\ \end{aligned} \end{equation}

We have proven that the equation $1 + 3 + \ldots + (2k-1) = \sum\limits_{i=1}^{k}(2i-1) = k^{2}$ implies the equation $1 + 3 + \ldots + (2k-1) + (2k+1) = \sum\limits_{i=1}^{k+1}(2i-1) = (k+1)^{2}$, that is, $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that $1 + 3 + \ldots + (2n-1) = \sum\limits_{i=1}^{n}(2i-1) = n^{2}$ for all positive integers $n$.

Q.E.D.

Example 6 - The Factorial Grows Faster Than An Exponential Function

Let $P(n)$ be the predicate \[P(n): 2^{n} < n!\] We will prove that $(\forall n \in \mathbb{N}_{\geq 4})P(n).$

Basis Step: First notice that the propositions $P(0),$ $P(1),$ $P(2),$ and $P(3)$ are all False! This is why we must use $P(4)$ as the basis. $P(4)$ is the proposition $2^{4} < 4!$ which is a True statement since $16 < 24.$

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$. In this context we can assume $k \geq 4.$

Assume that $P(k)$ is True with $k \geq 4,$ that is, $2^{k} < k!.$ If we multiply both sides of the inequality by $2$ we get \begin{equation} \begin{aligned} 2^{k+1} {} & = 2 \cdot 2^{k} \\ & < (k+1) \cdot 2^{k} \text{ (since $k \geq 4,$ we must have $k+1 \geq 5 >2$)}\\ & < (k+1) \cdot k! \text{ by the induction hypothesis} \end{aligned} \end{equation}

Notice that the expression on the last line is equal to $(k+1)!,$ so we have shown that $(2^{k} < k!) \rightarrow (2^{k+1} < (k+1)!)$ as long as $k \geq 4.$ That is, we’ve proven that $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all positive integers $n \geq 4$, $2^{n} < n!$

Q.E.D.

Example 7 - An Algebraic Expression That Evaluates To An Integer Multiple Of 5 For Each Natural Number Value Assigned To The Variable.

Consider the function that is informally defined for natural number inputs by the rule $f(n) = 11^{n}-6.$

Do you want to see the formal definition of this function?

As described in the Functions chapter, the formal definition of this function is the ordered triple \[ \left( \{(n, 11^{n}-6) : n \in \mathbb{N} \}, \mathbb{N}, \mathbb{N} \right). \]

We will prove that every output value is an integer multiple of $5.$

Let $P(n)$ be the predicate \[P(n): \text{“}11^{n}-6 \text{ is divisible by } 5\text{.”}\] We will prove that $(\forall n \in \mathbb{N})P(n).$

Basis Step: $P(0)$ is the proposition $\text{“}11^{0}-6 \text{ is divisible by } 5\text{,”}$ which is a True statement since $11^{0}-6 = -5$ and $-5$ is divisible by $5.$ If you find $P(0)$ to be unconvincing, you can use the next biggest natural number as well: $P(1)$ is the proposition $\text{“}11^{1}-6 \text{ is divisible by } 5\text{,”}$ which is True since $11^{1}-6 = 5$ and $5$ is divisible by $5.$

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$.

Assume that $P(k)$ is True, that is, assume that $11^{k}-6$ is divisible by $5,$ where $k$ is some natural number. We need to show that $11^{k+1}-6$ must also be divisible by $5.$

Let’s try to rewrite the expression $11^{k+1}-6$ in terms of the expression $11^{k}-6.$ \begin{equation} \begin{aligned} 11^{k+1}-6 {} & = 11 \cdot 11^{k}-6 \\ & = 11 \cdot 11^{k} + (11 \cdot -6 + 66) - 6 \text{ by adding zero in a clever way} \\ & = (11 \cdot 11^{k} + 11 \cdot -6) + (66 - 6) \text{ by regrouping} \\ & = 11 \cdot (11^{k} - 6) + 60 \end{aligned} \end{equation}

Recall that $(11^{k} - 6)$ is assumed to be divisible by $5.$ It is clear that $60$ is divisible by $5,$ so the entire expression $11 \cdot (11^{k} - 6) + 60$ is divisible by $5,$ as well, that is, $11^{k+1}-6$ is divisible by $5.$ We have proven the conditional statement $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all natural numbers $n,$ $11^{n}-6$ is divisible by $5.$

Q.E.D.

Exercise - Setting Up A Proof By Induction (Solving the Towers of Hanoi)

Set up the proof of the following statement.

The minimal number of moves needed to solve the Towers Of Hanoi puzzle with $n$ discs is $2^{n}-1.$

What is the predicate $P(n)$?

What value of $n$ is best to use in the Basis Step? That is, what is the smallest value of $n$ for which the statement/predicate could be True?

What is the Induction Hypothesis?

What property of the puzzle is the key feature used to prove the Induction Step? You can describe this in words or use an algebraic equation.

Exercise - Setting Up A Proof By Induction (Counting the Edges of a Complete Graph)

Set up the proof of the following statement.

The number of edges of a complete graph that has $n$ vertices is $\frac{1}{2}n(n-1).$

What is the predicate $P(n)$?

What value of $n$ is best to use in the Basis Step? That is, what is the smallest value of $n$ for which the statement/predicate could be True?

What is the Induction Hypothesis?

Suppose that you have a drawing of a complete graph that has $k+1$ vertices. What changes could you make to the drawing to create a drawing of a complete graph that has $k$ vertices?

Exercise - Setting Up A Proof By Induction (The Binomial Theorem)

Set up the proof of the following statement.

For every natural number $n$ and pair of real numbers $a$ and $b,$ \[(a+b)^{n} = \sum\limits_{i=0}^{n} {n\choose i} a^{n-i} b^{i}.\]

What is the predicate $P(n)$?

What value of $n$ is best to use in the Basis Step? That is, what is the smallest value of $n$ for which the statement/predicate could be True?

What is the Induction Hypothesis?

Suppose that you have an expression for $(a+b)^{k}$ as the sum given in the statement. Explain how to determine the coefficient of each term in the expression for $(a+b)^{k+1}.$

MORE PROOFS TO COME!

11.5. Validating An Algorithm Using Induction

In this section, we’ll prove that the Euclidean Algorithm as described below correctly computes the greatest common divisor of two positive integers.

Task: Given the positive integers $a$ and $b$ with $a > b$, compute the greatest common divisor (or "g.c.d") of $a$ and $b.$ That is, compute the greatest integer that is a factor of both $a$ and $b.$
- Input: Two positive integers
- Steps:
  1. Set $a$ to the greater and $b$ to the lesser of the two input values.
  2. Compute the remainder $r$ when $a$ is divided by $b$ (using long division of integers, not "floating-point" decimals.)
  3. If $r > 0$
    
    Set $a$ equal to $b$
    
    Set $b$ equal to $r$
    
    Go to step 2
  4. Return the value stored in $b.$
- Output: A positive integer that is a factor of both input values.

Example 8 - The Euclidean Algorithm in Python

The code below implements integer division for positive integers a and b.

Click on the "Next" button to step through the code.

Edit in PythonTutor

First, we will prove a lemma (that is, a minor theorem) that we’ll use in the induction step of the main proof.

Lemma

If $a$ and $b$ are integers such that $a > b > 0$ and the integers $q$ and $r$ satisfy \[ a = q \cdot b + r \text{ and } 0 \leq r < b \] then the set of positive integers that divide both $a$ and $b$ is the same as the set of positive integers that divide both $b$ and $r.$

Proof

Notice that $q$ is the quotient and $r$ is the remainder that result when doing a long division of $a$ by $b.$

Also notice that we can rewrite the equation $a = q \cdot b + r$ in the equivalent form $r = a - q \cdot b.$

Before starting the formal proof, let’s look at an example. When we divide $a = 126$ by $b = 35$ using long division of integers, we get the quotient $q = 3$ and the remainder $r = 126 - 3 \cdot 35 = 21.$ That is, we can write the two equations \[126 = 3 \cdot 35 + 21 \text{ and } 21 = 126 - 3 \cdot 35\] which are both True. It is easy to see that for any integer, that integer divides both $126$ and $35$ if and only if that same integer divides both $35$ and $21.$

We now prove the lemma.

If the integer $c$ divides both $a$ and $b$ then there are integers $a'$ and $b'$ so that $a = c \cdot a'$ and $b = c \cdot b'.$ Substitute these two new expressions in the equation $r = a - q \cdot b$ to get \begin{equation} \begin{aligned} r {} & = a - q \cdot b \\ & = (c \cdot a') - q \cdot (c \cdot b') \\ & = c \cdot a' - c \cdot (q \cdot b') \\ & = c \cdot (a' - q \cdot b') \end{aligned} \end{equation} which shows that $c$ is also a divisor of $r.$ Since we already assumed that $c$ divides $b,$ this means that $c$ is a divisor of both $b$ and $r.$ Therefore, the set of positive integers that divide both $a$ and $b$ is a subset of the set of positive integers that divide both $b$ and $r.$
If the integer $k$ divides both $b$ and $r$ then there are integers $b''$ and $r''$ so that $b = k \cdot b''$ and $r = k \cdot r''.$ Substitute these two new expressions in the equation $a = q \cdot b + r$ to get \begin{equation} \begin{aligned} a {} & = q \cdot b + r \\ & = q \cdot (k \cdot b'') + (k \cdot r'') \\ & = k \cdot (q \cdot b'') + k \cdot r'' \\ & = k \cdot (q \cdot b'' + r'') \end{aligned} \end{equation} which shows that $k$ is also a divisor of $a.$ Since we already assumed that $k$ divides $b,$ this means that $k$ is a divisor of both $a$ and $b.$ Therefore, the set of positive integers that divide both $b$ and $r$ is a subset of the set of positive integers that divide both $a$ and $b.$

So we have two subsets, each of which is a subset of the other, which means that the two sets must be equal. That is, the set of positive integers that divide both $a$ and $b$ is equal to the set of positive integers that divide both $b$ and $r;$ more plainly, the two set descriptions define the same set.

Q.E.D.

From the lemma we can conclude that the greatest common divisor of $a$ and $b$ is equal to the greatest common divisor of $b$ and $r.$

We are now ready to prove the main result by induction.

Theorem

The Euclidean Algorithm correctly computes the greatest common divisor (g.c.d.) of the two positive integers $a$ and $b.$

Proof

We use mathematical induction on the number of times $n$ we must compute a new remainder (that is, the number $n$ of iterations of the code block inside the loop), and prove that the algorithm computes the correct g.c.d. no matter what the value of $n$ is.

Let $P(n)$ be the predicate "If the loop executes $n$ times, then the last nonzero remainder is the g.c.d. of the two initial inputs." We will prove that $(\forall n \in \mathbb{N})P(n)$ is True.

Basis Step: Notice that the number of iterations of the loop is $n=0$ if and only if the value of $r = a\%b$ is equal to $0$, which is True if and only if $b$ divides $a.$ This means that the "last nonzero remainder" is $b,$ and $b$ is the greatest common divisor of $a$ and $b,$ which means that $P(0)$ is True.

We can also prove that $P(1)$ is True in case the proof of $P(0)$ is unsatisfying. In the case when $n=1,$ the loop executes $1$ time, which means that $a = q \cdot b + r$ and $r$ is a nonzero divisor of $b,$ so $r$ is the g.c.d. of $b$ and $r,$ and we can use the lemma to conclude that $r$ is also the g.c.d. of $a$ and $b.$ This proves that $P(1)$ is True.

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$.

Assume that $P(k)$ is True for the positive integer $k,$ that is, if the loop executes $k$ times, then the last nonzero remainder is the g.c.d. of the two numbers we started with. We can assume $k \geq 1$ since the cases when $k \in \{ 0, 1 \}$ were proved in the Basis Step. Suppose that we have numbers $a$ and $b$ such that the loop executes $k+1$ times in order to reach the last nonzero remainder: We need to prove that this last nonzero remainder is actually the g.c.d. of $a$ and $b.$ Now, notice that if we find the first remainder so that $r = a - q \cdot b$ and $0 < r < b,$ then the Euclidean Algorithm requires $k$ loop iterations to find the last nonzero remainder for the pair of inputs $b$ and $r.$ That is , we know from the induction hypothesis that the last nonzero remainder for the initial values $b$ and $r$ is the g.c.d. of $b$ and $r.$ Now apply the lemma to conclude that the g.c.d. of $a$ and $b,$ computed after $k+1$ loop iterations, is equal to the g.c.d. of $b$ and $r,$ computed after $k$ loop iterations. Therefore, $P(k+1)$ is True, too. This proves that $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all natural numbers $n,$ if the loop executes $n$ times then the last nonzero remainder is the g.c.d. of $a$ and $b.$ That is, the Euclidean Algorithm correctly computes the g.c.d. of $a$ and $b$ no matter how many loop iterations are required to compute the last nonzero remainder.

Q.E.D.

Notice that if $a = f_{n+2}$ and $b = f_{n+1}$ are consecutive Fibonacci numbers then the Euclidean Algorithm requires exactly $n$ loop iterations to compute the last nonzero remainder.
The French mathematician Gabriel Lamé proved in 1844 that if the Euclidean Algorithm requires $n$ loop iterations to compute the last nonzero remainder for two given positive integer inputs $a$ and $b$ (with $a>b$) then it must be True that both $f_{n+2} \leq a$ and $f_{n+1} \leq b.$ A proof by induction of Lamé’s theorem is given at this Wikipedia page.
The "worst-case complexity" for the Euclidean Algorithm is described at that webpage as well: The Euclidean Algorithm is $O(log_{\phi}(b)).$ (Complexity and "big $O$" notation are discussed in the Rates of Growth of Functions chapter.)

11.6. Strong Induction

Strong induction is used when it is easier to use the assumption that all the propositions $P(0), P(1), \ldots , P(n-1)$ are True in order to prove that $P(n)$ is True.

Basis Step	Prove the predicate $P(n)$ is True for one or more consecutive small values of $n;$ in most but not all cases, you prove both $P(0)$ and $P(1).$
Induction Step	Prove that the conditional statement $( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)$ is True for any natural number $k.$ Using modus ponens allows you to infer that $P(k+1)$ must also be True based on the assumption that all of the propositions $P(0), P(1), \ldots , P(k)$ are True.
Conclusion Step	Conclude that $P(n)$ is True for all natural numbers $n$ that are greater than or equal to the smallest value used in the Basis Step.

In spite of the name, strong induction and "weak" induction are equivalently powerful techniques in the sense that any proposition that you can prove using strong induction can also be proven by "weak" induction, and any proposition that you can prove using "weak" induction can also be proven by strong induction. The choice of which of the two proof techniques to use is based on convenience only, not power.

Example 9 - An Upper Bound for the Fibonacci numbers

Recall that the Fibonacci numbers are defined by the following recurrence relation: \[f_{0}=0, \, f_{1}=1, \text{ and } f_{n} = f_{n-1} + f_{n-2} \text{ for } n \geq 2.\]

We will prove that the predicate \[P(n): f_{n} < \displaystyle \left( \frac{1+\sqrt{5}}{2} \right)^{n}\] is True for all natural numbers $n.$ The proof will use strong induction because, for each $n \geq 2,$ the value of $f_{n}$ is defined in terms of both $f_{n-1}$ and $f_{n-2}.$
The number on the right-hand side of the inequality is a famous constant called the golden ratio which is usually denoted by the lowercase Greek letter $\phi$ ("phi"): \[ \phi = \frac{1+\sqrt{5}}{2} \] $\phi$ is an irrational number whose first few digits are $1.618 \ldots .$ Also, $\phi$ is the positive solution of the equation $x^{2} = x + 1,$ which is a fact that we will use in the proof. The other solution of $x^{2} = x + 1$ is $1 - \phi,$ a fact you can use when you attempt the Challenge question at the end of this example. \[ 1 - \phi = \frac{1-\sqrt{5}}{2} \] Notice that you can verify that $\phi$ and $1-\phi$ are the two roots of $x^{2} - x - 1 = 0$ by using the quadratic formula.

Basis Step: In this case, we need to prove that the conjunction $P(0) \land P(1)$ is True as our basis for strong induction.

$P(0)$ is the proposition $f_{0} < \phi^{0}$ which is True since $0 < 1.$

$P(1)$ is the proposition $f_{1} < \phi^{1}$ which is True since $1 < \phi.$

Since $P(0)$ and $P(1)$ are True, you can use the tautology $q \rightarrow ( r \rightarrow (q \land r) )$ to conclude that $P(0) \land P(1)$ is True, too.

Induction Step: First, we assume as the induction hypothesis that \[P(i) \text { is True for all positive natural numbers } i \leq k \] where we can assume that the integer $k$ is greater than or equal to $2$ (since the cases where $k < 2$ were already dealt with in the Basis Step.) That is, we assume that the single proposition $P(0) \land P(1) \land \cdots \land P(k)$ is True, where $k$ is some integer greater than or equal to 2.

Secondly, we will prove that the conditional $( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)$ must be True, which means we can use modus ponens to show that $P(k+1)$ is also True.

If the inequality $f_{i} < \phi^{i}$ is True for each $i \leq k,$ then \begin{equation} \begin{aligned} f_{k+1} {} & = f_{k} + f_{k-1} \\ & < \phi^{k} + \phi^{k-1} \text{ by the induction hypothesis} \\ & \leq \phi^{k-1} \cdot (\phi + 1) \text{ by algebra} \\ & \leq \phi^{k-1} \cdot \phi^{2} \text{ since } \phi^{2} = \phi + 1 \\ & \leq \phi^{(k-1)+2} \text{ using one of the laws of exponents} \\ & \leq \phi^{k+1} \end{aligned} \end{equation}

We have proven that if $f_{i} < \phi^{i}$ is True for each natural number $i \leq k,$ then $f_{k+1} < \phi^{k+1}$ must also be True. That is, we have proven $( ( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all natural numbers $n,$ $f_{n} < \phi^{n}.$

Q.E.D.

Challenge

Use strong induction to prove that the closed form of the Fibonacci numbers is \[ f_{n} = \displaystyle \frac{\phi^{n} - (1-\phi)^{n}}{\sqrt{5}} \text{ for all natural numbers }n. \]

Hint

For the Induction Step, use the fact that both $\phi^{2} = \phi + 1$ and $(1-\phi)^{2} = (1-\phi) + 1$ are True and use steps similar to the ones in the Induction Step in the proof above but working with equations instead of inequalities. Using the two equations in the previous sentence will help you avoid working directly with expressions like \[f_{n} = \displaystyle \frac{ \left( \displaystyle \frac{1 + \sqrt{5}}{2} \right)^{n} - \left( \displaystyle \frac{1 - \sqrt{5}}{2} \right)^{n}}{\sqrt{5}}\] or \[ f_{n} = \displaystyle \frac{ \left( 1 + \sqrt{5} \right)^{n} - \left( 1 - \sqrt{5} \right)^{n} }{ 2^{n} \sqrt{5} } \] which would be unnecessarily difficult and time-consuming.

Next, we’ll prove the following theorem.

The Fundamental Theorem of Arithmetic

Every positive integer $n$ that is greater than or equal to $2$ is either a prime number or can be written as a product of two or more prime numbers.

Proof

Let $P(n)$ be the predicate \[P(n) \text{: "} n \text{ is either prime or the product of two or more primes."}\] We will prove that $(\forall n \in \mathbb{N}_{\geq 2})P(n),$ that is, each positive integer $n \geq 2$ is either a prime or a product two or more prime numbers.

Basis Step: $P(2)$ is True since $2$ is a prime number. In this case, we could use only $P(2)$ as the basis, but it is easy to prove $P(3)$ and $P(4)$ are True since $3$ is prime and $4 = 2 \cdot 2$ is a product of two primes.

Induction Step: First, we assume as the induction hypothesis that \[P(i) \text { is True for all positive natural numbers } i \text{ such that } 2 \leq i \leq k \] where we can assume that the integer $k$ is greater than or equal to $4$ (since the cases where $k \in \{ 2, 3, 4 \}$ were already dealt with in the Basis Step.) That is, we assume that the single proposition $P(2) \land P(3) \land \cdots \land P(k)$ is True, where $k$ is some integer greater than or equal to 4.

Secondly, we will prove that the conditional $( P(2) \land P(3) \land \cdots \land P(k) ) \rightarrow P(k+1)$ must be True, which means we can use modus ponens to show that $P(k+1)$ is also True.

There are two cases: Either $k+1$ is prime or it is not prime (that is, it is composite.)

If $k+1$ is prime, then $P(k+1)$ is True in the case when $k+1$ is a prime number.
If $k+1$ is not prime, then there are two integers $a$ and $b$ that are both greater than $1$ such that $k+1 = ab.$ Notice that both $a$ and $b$ must be less than $k+1$ because if either one were greater than or equal to $k+1$ then the product $ab$ would be greater than or equal to $2(k+1).$ Assuming the induction hypothesis, both $P(a)$ and $P(b)$ are True, so each of $a$ and $b$ is either a prime or a product of two or more primes, which means that the product $ab$ is a product of at least two primes (if both $a$ and $b$ are primes, they are the only two factors of $k+1,$ otherwise, there will be more than two prime factors of $k+1.$) Since $k+1 = ab$, this proves that $P(k+1)$ is True in the case when $k+1$ is a composite number.

In either case, we have shown that $P(k+1)$ must be True. We have proven that if $P(i)$ is True for each natural number $i$ with $2 \leq i \leq k,$ then $P(k+1)$ must also be True. That is, we have proven $( ( P(2) \land P(3) \land \cdots \land P(k) ) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all natural numbers $n \geq 2,$ $n$ is either a prime number or a product of two or more prime numbers.

Q.E.D.

Exercise - Setting Up A Proof By Strong Induction (Counting the Edges of a Rooted Tree)

Set up the proof of the following statement.

The number of edges of a rooted tree is one less than the number of vertices of the rooted tree.

What is the predicate $P(n)$?

What value(s) of $n$ is/are best to use in the Basis Step? That is, what is/are the smallest value(s) of $n$ for which the statement/predicate could be True?

What is the Induction Hypothesis?

Suppose that you have a drawing of a rooted tree that has $k+1$ vertices. How can you use the recursive definition of rooted trees to apply the Induction Hypothesis?

11.7. Validating A Recursive Algorithm Using Strong Induction

In this section, we’ll use strong induction to prove the correctness of a recursive algorithm. That is, we will show that the algorithm must work as intended.

Strong induction is useful for proving properties of recursively-defined objects since each use of the Recursion Step can depend on any previously completed steps instead of just one previous step.

11.7.1. The MergeSort Algorithm

In this subsection, we describe the MergeSort algorithm, which is an example of a “divide-and-conquer” algorithm that breaks up a large problem into two or more smaller versions of the same problem and then combines the solutions of the smaller problems to find a solution to the original large problem.

Here is a description of the MergeSort algorithm.

MergeSort

Task: Sort the numbers in a list $L$ in increasing order. The numbers can be integers or real numbers.
Input: The list $L.$
Steps:
1. If the length of list $L$ is less than $2,$ return $L.$
2. Set $c$ to the ceiling of half the length of list $L.$
3. Define the list $L_{\text{left}}$ to be the list of the first $c$ elements of list $L,$ in the current order, and
  define $L_{\text{right}}$ to be the list of the remaining elements of list $L$, in the current order.
4. Define $L_{\text{sorted}}$ to be the list returned by the Merge algorithm called with the two input lists $MergeSort(L_{\text{left}})$ and $MergeSort(L_{\text{right}}).$
5. Copy the values to $L$ from $L_{\text{sorted}}.$
6. Return $L.$
Output: The list $L,$ sorted in increasing order.

The MergeSort algorithm is recursive:

Basis Step	The shortest lists, ones that are either empty or have only a single element, are processed at Step 1 of the algorithm. These lists are considered to be sorted by definition.
Recursion	Longer lists are processed starting at Step 2 of the algorithm. In particular, notice that at Step 4, the algorithm calls on a second algorithm, Merge, after recursively calling itself twice, once for each of the shorter lists $L_{\text{left}}$ and $L_{\text{right}}.$

Of course, we will need to describe precisely what the Merge algorithm does, which we will do below. An implementation of both the MergeSort and Merge algorithms in Python will be given, too. More detail about MergeSort, along with a different Python implementation, can be found in the Algorithms and Their Analysis chapter.

Next, we describe the Merge algorithm, which is designed to combine two ordered lists into a single ordered list.

Merge

Task: Create a new list $M$ that is sorted in increasing order and contains all the numbers in two given lists of numbers, $A$ and $B,$ that already were sorted in increasing order.
Input: The lists $A$ and $B.$ The numbers in each list can be integers or real numbers.

Steps:
1. Define a list $M$ and initialize it as the empty list.
2. Define index variables $i$ and $j$ and initialize both to $0.$
3. While there are elements in both $A$ and $B$ not yet appended to $M$
  1. If $A[i$] is less than or equal to $B[j$]
    
    Append $A[i$] to $M$
    
    Increment $i$ by $1$
  2. Otherwise, $B[j$] must be less than $A[i$] so
    
    Append $B[j$] to $M$
    
    Increment $j$ by $1$
4. If all the elements of $A$ have been appended to $M,$ then append all remaining elements of $B$ to $M.$
5. If all the elements of $B$ have been appended to $M,$ then append all remaining elements of $A$ to $M.$
6. Return $M.$
Output: List $M,$ containing all the numbers in the two lists $A$ and $B,$ sorted in increasing order.

The following Python code implements both MergeSort and Merge. Try tracing through the code to see how it works. This code is a bit longer than most sample in this textbook, so it may be easier to view and trace it in a new tab or window by clicking on the link labeled “Edit in PythonTutor.”

Edit in PythonTutor

Now that you have had the opportunity to see the how MergeSort and Merge work as implemented in Python, you are ready to try to prove the correctness of both algorithms.

First, let’s discuss a proof of correctness for the Merge algorithm.

Lemma

For any two finite-length lists of numbers that are sorted in increasing order, the Merge algorithm correctly constructs a list $M$ that is sorted in increasing order and contains all the numbers in the two lists.

Proof

This proof is an exercise for you!

Notice that if you assume each list is nonempty then after one element has been appended to $M$ you can now treat whichever list contained that value as if it is a list with one less element (that is, you can treat that list as if the smallest element had been removed from the list and all other elements were re-indexed.) This suggests that using mathematical induction (not strong induction) where $n$ is equal to the sum of the lengths of the two input lists will work.

Suggestions:

First, write down the predicate $P(n)$ that you need to prove.

For the Basis Step, look at the three cases $n = 0$ (both lists are empty,) $n = 1$ (one list is empty, the other is not,) and $n = 2$ (either each list contains one element, or one list is empty and the other contains $2$ elements.) These cases are handled differently by the Merge algorithm (different Steps/parts of the Python code are used in each case,) but should cover all the possible ways the algorithm will handle short lists.

Write down the Induction Hypothesis $P(k).$

For the Induction Step, assume that the sum of the lengths of the two input lists is $k+1,$ and use the idea discussed above: After the first element is appended to $M,$ the sum of the lengths of the list is reduced to $k,$ the case you assume can be handled correctly by the algorithm.

Next, let’s prove the correctness of the MergeSort algorithm.

Theorem

For any finte-length list of numbers, the MergeSort algorithm sorts the list in increasing order.

Proof

Let $P(n)$ be the predicate \[P(n) \text{: “For any list of numbers that has length } n \text{, } \text{the } MergeSort \\ \text{ algorithm sorts the list in increasing order.”}\] We will prove that $(\forall n \in \mathbb{N})P(n),$ that is, the MergeSort algorithm works as intended for any list $L$ that has finite length.

Basis Step: Both $P(0)$ and $P(1)$ are True since the MergeSort algorithm returns the original list unchanged if the length of the list is less than or equal to $1$ (An empty list can be considered as sorted for our purposes, and a list that contains only a single element already is sorted in increasing order.)
$P(2)$ is True, too, since the MergeSort algorithm will invoke the Merge algorithm, which will not change the order of the elements of the list if the two elements already are in increasing order but will swap the two elements otherwise; in either case the output of the MergeSort algorithm is the original list sorted in increasing order.

Induction Step: First, we assume as the induction hypothesis that \[P(i) \text { is True for all positive natural numbers } i \text{ such that } 2 \leq i \leq k \] or, in words, “For any list of numbers that has length less than or equal to $k,$ the MergeSort algorithm sorts the list in increasing order,” where we can assume that the integer $k$ is greater than or equal to $2$ (because the cases where $k \in \{ 0, 1, 2 \}$ were already dealt with in the Basis Step.) That is, we assume that the single proposition $P(0) \land P(1) \land \cdots \land P(k)$ is True, where $k$ is some integer greater than or equal to $2.$

Assume that we have a list $L$ that has length $k+1.$ Notice that when we split the list $L$ as described in the algorithm, each of the resulting left and right sublists must have length less than $k+1,$ so the Induction Hypothesis applies to those sublists: The MergeSort algorithm correctly sorts the left and right sublists. From the lemma, we know that when Merge is invoked on the two sublists, it creates a list that is sorted in increasing order that contains all the elements of the original input list. The list that is output by Merge is then copied element-by-element to the original input list $L,$ resulting in list that is sorted in increasing order and that includes all the elements of the unsorted input list.

We have proven that if $P(i)$ is True for each natural number $i$ with $0 \leq i \leq k,$ then $P(k+1)$ must also be True. That is, we have proven $( ( P(0) \land P(1) \land \cdots \land P(k) ) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all positive integers $n,$ the MergeSort algorithm sorts any input list of length $n$ in increasing order. Since this is True for every natural number value of $n,$ we have successfully created a proof of correctness for the MergeSort algorithm: For any finte-length list of numbers, the MergeSort algorithm sorts the list in increasing order.

Q.E.D.

11.8. Exercises

Prove by induction:
1. For all $n ≥ 1, $ $1+2+3+\ldots+n=\displaystyle\sum_{i=1}^{n}i=\displaystyle \frac{n\left(n+1\right)}{2}$
2. For all $n ≥ 1, $ $1^2+2^2+3^2+\ldots+n^2=\displaystyle\sum_{i=1}^{n}i^2=\frac{1}{6} n (n+1) (2 n+1)$
3. For all $n ≥ 1, $ $1^3+2^3+3^3+\ldots+n^3=\displaystyle\sum_{k=1}^{n}k^3=\frac{1}{4} n^2 (n+1)^2$
4. For all $n ≥ 1, $ ${23}^n-1$ is divisible by 11.
Prove by induction that $n^2+n = n(n+1)$ is even for all integers $n ≥ 1$.
Find an appropriate $N \in \mathbb{Z}$, and prove by induction that $n^3 +3n^2$ is even for all $n ≥ N$.
Find an appropriate $N \in \mathbb{Z}$, and prove by induction that $n^3 +2n$ is divisible by 3 for all $n ≥ N$. (Hint: You may use the result $n(n+1)$, is even for $n$, an integer.)
Prove by induction that $7$ divides $2^{4n+2} + 3^{2n+1}$ for all nonnegative integers $n$.
Prove that for any $n ≥ 1$ and $x ≥ 0$ that $\left(1+x\right)^n\geq1+nx$.
For all $n ≥ 5$, prove that $n^2 < 2^n$
Graph $n!$ and $2^n$, and then prove by induction that $ 2^n < n!$ for $n>3$.
Graph $n^3$ and $5n+12$, and then use your graph to find an appropriate $N \in \mathbb{Z}$ to prove by induction that $5n+12 < n^3$ whenever $n>N$.
Prove by induction that a set $A$ with cardinality $|A|=n$ has $2^n$ subsets.
Prove by induction that there are $3^n$ numbers in base 3 (using the digits 0 ,1, 2) made up of $n$ digits.
Prove by induction that there are $4^n$ numbers in base 4 (using the digits 0 ,1, 2, 3) made up of $n$ digits.
State the principle of mathematical induction using a conditional logical statement.
Consider the sequence defined recursively as \[a_1=1,a_2=5, \text{ and } a_n=5a_{n-1}-6a_{n-2}\]
1. Calculate the first eight terms of the recursive sequence.
2. Prove by induction that the closed-form formula for the sequence is $a_{n} = 3^{n} - 2^{n}.$
  (Hint: You can use the fact that $2$ and $3$ are the solutions of the quadratic equation $x^{2} = 5x - 6.$)
Consider the sequence defined recursively as \[a_1=1 \text{ and } a_n=2a_{n-1}+n\]
1. Calculate the first eight terms of the recursive sequence
2. Prove by induction that the recursive sequence is given by the formula $a_n={4\cdot2}^{n-1}-n-1$.
Recall that the Fibonacci numbers are defined by the following recurrence relation: \[f_{0}=0, \, f_{1}=1, \text{ and } f_{n} = f_{n-1} + f_{n-2} \text{ for } n \geq 2.\]
1. Prove by induction that \[ f_0+f_1+f_2+\ldots+f_n= f_{n+2}-1. \]
2. Prove by induction that \[ f_0^2+f_1^2+f_2^2+ \cdots + f_n^2 =f_n \cdot\ f_{n+1}. \]

12. Rates of Growth of Functions

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on April 14, 2025.
Contents locked until 11:59 p.m. Pacific Standard Time on May 23, 2025.

You have seen that some tasks can be completed by more than one algorithm. Two questions to ask are

"How do you choose which algorithm to use?"
"Why is is it important to make such a choice?"

This chapter will discuss tools you can use to help answer these questions. In particular, ways of comparing the rates of growth of functions will allow us to compare how two algorithms perform as the size of their input increase with no upper bound.

Key terms and concepts covered in this chapter:

Complexity
Big-$\Theta$ notation
Big-$O$ notation

12.1. Complexity of Algorithms

In order to implement an algorithm, there are issues of the space needed to do the work and the time needed to complete all the steps.

For example, imagine that you are asked to complete a few Algebra homework exercises by hand, and each exercise involves solving linear equations by hand using paper and pencil. Now suppose that "few" means "one hundred" and that each linear equation involves multiple steps to solve like the one below. \[ \text{Exercise 1. Solve for } x \text{: } 40(x+6)-9(3x+5) = 65(x+7)-(7x+132)\] It is not difficult to solve linear equations like this one because it is clear what steps you need to use… but it is tedious and will likely require a lot of paper! That is, it will consume a lot of time and space to solve even the first one of these equations, and you’ll have only ninety-nine more to do after that!

In this textbook, the focus will be on time complexity and asymptotics, that is, the comparison of the time needed by the algorithms as the size of the input becomes larger and larger without any bound.

12.2. The Order of a Function and Big Theta Notation

In this section, we will define a relation that describes what it means to say that "two functions grow at the same rate." More precisely, "two functions grow at the same rate, asymptotically, as the input variable grows without an upper bound." We will also introduce big $\Theta$ notation.
Note: $\Theta$ is the uppercase Greek letter "Theta."

Definition

Suppose that $f$ and $g$ are two functions, both having domain $\mathbb{R}$ and codomain $\mathbb{R}$. Define the relation f has the exact same order as g to mean that

There exist positive real number constants $A,$ $B,$ and $x_{0}$ so that \[ A|g(x)| \leq |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}.\]

The constants $A,$ $B,$ and $x_{0}$ are called witnesses: The constants confirm the "has the exact same order as" relationship between the two functions.
The Remix’s definition of "has the exact same order as" is based on notations and descriptions proposed by Donald Knuth in the 1976 letter "Big Omicron and Big Omega and Big Theta" published in ACM SIGACT News.

Illustration: Two functions that have the exact same order

Consider the functions $f$ and $g,$ each with domain $\mathbb{R}$ and codomain $\mathbb{R},$ described by the rules \[ f(x) = x + \sin(x), \, g(x) = x.\] The function g is linear and the function f is not, but f can be thought of as asymptotically linear in the sense that its growth is more like that shown in a straight line plot of a linear function than, say, a parabolic plot for a quadratic function or a square root function. This is what we are describing with the relation "has the exact same order as."

Let’s plot the graphs of f and two constant multiples of g to illustrate what the relation "has the exact same order as" means.

The preceding image shows the plots of the graphs of the functions $f,$ $0.9g,$ and $1.1g.$ Notice that for x between -10 and 10, the corresponding y output for f is sometimes between the two straight lines, sometimes above both lines, and sometimes below both lines.

However, if we zoom out, it appears that the plotted graph of $f$ lies between the two straight lines for all values of the input x that have a large absolute value. In fact, the image seems to indicate that \[ 0.9g(x) \leq f(x) \leq 1.1g(x) \text{ for all } x \geq 100.\] That is, the image suggests that f has the exact same order as g. Notice that we could prove, more rigorously, that the inequality is true, by using some algebra and the fact that the sine function’s outputs are in the interval $[-1, \, 1$], so the functions have the exact same order. The two functions $f$ and $g$ grow at the same rate, asymptotically, as the input variable grows without an upper bound.
It appears from the zoomed-out plot that we could choose a value less than 100 for the bound $x_{0},$ but the definition of "has the exact same order as" does not require us to find optimal values of any of the constants $A,$ $B,$ and $x_{0}.$ In fact, if there exists at least one ordered triple $( A, \, B, \, x_{0})$ that witnesses the "has the exact same order as" relation between two functions, then there must be infinitely many other ordered triples that witness the same relationship. As an example, we’ve used the ordered triple $( 0.9, \, 1.1, \, 100)$ for the two multipliers and the lower bound for inputs, but we could use the same two multipliers and any value greater than 100 for $x_{0}$ instead. Also, the triple $( \frac{1}{4}, \, 4, \, 0)$ witnesses the same relationship since \[ \frac{1}{4}g(x) \leq f(x) \leq 4g(x) \text{ for all } x \geq 0.\]

We have the following theorem about the "has the exact same order as" relation.

Theorem

For any functions $f,$ $g,$ and $h$ with domain $\mathbb{R}$ and codomain $\mathbb{R},$

(1) $f$ has the exact same order as $f,$
(2) if $f$ has the exact same order as $g,$ then $g$ has the exact same order as $f,$
(3) if $f$ has the exact same order as $g,$ and $g$ has the exact same order as $h,$ then $f$ has the exact same order as $h.$

Proof

For statement (1), choose any value $x_{0}$ that is in the domain of $f$ and $A = 1$ and $B = 1$ as witnesses. Since \[1 \cdot |f(x)| \leq |f(x)| \leq 1 \cdot |f(x)| \text{ for all } x \geq x_0\] must be True, $f$ has the exact same order as $f.$ Notice that we could have used other values for the witnesses $A$ and $B$ such as $A = 0.99$ and $B = 1.01.$

For statement (2), assume that $f$ has the exact same order as $g,$ so there are positive real number constants $A,$ $B,$ and $x_{0}$ such that \[A|g(x)| \leq |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}.\] Notice that the extended inequality above can be broken into the two inequalities \[A|g(x)| \leq |f(x)| \text{ and } |f(x)| \leq B|g(x)|\] which are both True for all $x > x_{0}.$ The two inequalities can be rewritten as \[|g(x)| \leq \frac{1}{A}|f(x)| \text{ and } \frac{1}{B}|f(x)| \leq |g(x)|\] which shows that \[\frac{1}{B}|f(x)| \leq |g(x)| \leq \frac{1}{A}|f(x)| \text{ for all } x > x_{0}.\] The last extended inequality above shows that $g$ has the exact same order as $f.$

For statement (3), assume both that $f$ has the exact same order as $g$ and that $g$ has the exact same order as $h.$ This means that there are positive real number constants $A,$ $B,$ and $x_{0}$ such that \[A|g(x)| \leq |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}\] and also positive real number constants $C,$ $D,$ and $x_{1}$ such that \[C|h(x)| \leq |g(x)| \leq D|h(x)| \text{ for all } x > x_{1}.\] By breaking up the extended inequalities, then doing some algebra and recombining inequalities, you can get \[AC|h(x)| \leq A|g(x)| \leq |f(x)| \text{ and } |f(x)| \leq B|g(x)| \leq BD|h(x)|\] which are True for all pass:[$x > max(x_{0}, x_{1}).$] So \[AC|h(x)| \leq |f(x)| \leq BD|h(x)| \text{ for all } x > max(x_{0}, x_{1})\] which shows that $f$ has the exact same order as $h,$ witnessed by the constants $AC,$ $BD,$ and $max(x_{0}, x_{1}).$

These three properties let you conclude that the "has the exact same order as" relation is an equivalence relation, so the relation partitions the set $S = \{ f \, | \, f \text{ is a function with domain and codomain } \mathbb{R} \}$ into disjoint sets. For each function $g \in S$ we can define $\Theta(g)$ to be the equivalence class \[ \Theta(g) = \{ f \, | \, f \text{ has the exact same order as } g \} \] Every function with domain and codomain $\mathbb{R}$ is an element of at least one of the $\Theta(g)$ and for any two functions $g$ and $h,$ the sets $\Theta(g)$ and $\Theta(h)$ must either be equal or have empty intersection. For example, the earlier example shows that $\Theta(x + \sin x)$ and $\Theta(x)$ are the same set, so we can say that the function $f(x) = x + \sin x$ is of linear order.

Mathematicians and computer scientists are very different beasts… well, they are all human but they have developed different cultures so they often use the same symbols in different ways.

A mathematician, like the author of the Remix, would write the very formal $f \in \Theta(g)$ and state "f is an element of Theta g" to mean that "f has the exact same order as g." In the earlier example, a mathematician could abbreviate this a little bit and write "$x + \sin(x)$ is in $\Theta(x).$"

Computer scientists have traditionally written this relation as $f(x) = \Theta(g(x))$ and state "$f(x)$ is big Theta of $g(x)$." In the earlier example, a computer scientist could write "$x + \sin(x) = \Theta(x)$." As a mathematician, I need to point out that the function f is not equal, in the mathematical sense, to the equivalence class containing g because it’s just one of the infinitely many functions in that equivalence class.

I believe that both mathematicians and computer scientists agree that Θ(g(x)) = f(x) is just too hideous a notation to use… so please do not ever, ever use it!

12.3. Big O notation

Traditionally, computer scientists are much more interested in the idea that "f grows at most at the rate of g". This corresponds to the second part of the inequality used to define big Theta in the previous section.

Definition

f is of order at most g means that there exist positive real number constants $B$ and $x_{0}$ so that \[ |f(x)| \leq B|g(x)| \text{ for all } x > x_{0}.\] This is usually stated (by computer scientists) as "$f(x)$ is Big O of $g(x)$" and written as $f(x) = O(g(x)).$

Note that Big O only gives an upper bound on the growth rate of functions. That is, the function $f(x) = x + \sin(x)$ with domain and range $\mathbb{R},$ used in an earlier example, is $O(x)$ but also is $O(x^{2})$ and is $O(2^{x}).$

Big O is typically used to analyze the worst case complexity of an algorithm. If, for example, $n$ is the size of the input, then big O really only cares about what happens in the "worst-case" when $n$ becomes arbitrarily large. Mathematically, we want to consider time complexity in this asymptotic sense, when $n$ is arbitrarily large, so may ignore constants. That we can ignore constants will make sense after discussing how limits, borrowed from continuous mathematics (that is, calculus), can be used to compare the rates of growth of two different functions.

12.3.1. Common Complexities To Consider

The size of the input complexities most commonly used, ordered from smallest to largest, are as follows.

Constant Complexity: $O(1)$
Logarithmic Complexity: $O(\log (n))$,
Radical complexity : $O(\sqrt{n})$
Linear Complexity: $O(n)$
Linearithmic Complexity: $O(n\log (n))$,
Quadratic complexity: $O(n^2)$
Cubic complexity: $O(n^3)$,
Exponential complexity: $O(b^n)$, $ b > 1$
Factorial complexity: $ O(n!)$

To understand the sizes of input complexities, we will look at the graphs of functions; it is easier to consider these functions as ones that are defined for any real value input instead of just the natural numbers. This will also allow us to use continuous mathematics (that is, calculus) to analyze and compare the growth of different functions.

Radical growth is larger than logarithmic growth:

In the preceding graph, we’ve used $\text{Log}[x$] to label the graph of a logarithmic function without stating the base for the logarithm: Is this the function $y = log_{2}(x)$, $y = log_{10}(x)$, $y = ln(x) = log_{e}(x)$, or a logarithm to some other base? For the purposes of studying growth of functions, it does not matter which of these logarithms we use: You may recall that one of the properties of logarithms states that for two different positive constant bases $a$ and $b$ we must have $log_{a}(x) = log_{a}(b) \cdot log_{b}(x)$, where $log_{a}(b)$ is also a constant. As stated earlier, we may ignore constants when considering the growth of functions.

Polynomial growth is larger than radical growth:

Exponential growth is larger than polynomial growth:

Factorial growth is larger than exponential growth:

In the preceding graph, we’ve used $x!$ to label the graph of the function $y = \Gamma(x+1)$ , where $\Gamma$ is the Gamma function which is defined and continuous for all nonnegative real numbers. That is, $n! = \Gamma(n+1)$ for every $n \in \mathbb{N}$. Further study of the Gamma function is beyond the scope of this textbook.

Using the graphical analysis of the growth of typical functions we have the following growth ordering, also presented graphically on a logarithmic scale graph.

Ordering of Basic Functions by Growth

\$1,\log \ ⁡n, root(3)(n), sqrt n , n, n^2, n^3,2^n,3^n,n!, n^n\$

The asymptotic behavior for large $n$ should be determined by the most dominant term in the function for large $n$. For example, $f(x)=x^{3} + 2x^{2}-2x$ for large $x$, is dominated by the term $x^3$. In this case we want to state that $f(x)=O(x^3)$. For example $f(1000) =1.001998×10^9≈ 1×10^9 =1000^3$. For large $x$, $f(x) ≈x^3$ or asymptotically, $f(x)$ behaves as $x^3$ for large $x$. We write $f(x)=O(x^3),$ that is, $x^3 +2x^2-2x=O(x^3).$

Likewise we want to say that if $c$ is a constant that $c \cdot f(x)$, and $f(x)$ have the same asymptotic behavior for large $n$, or $O(c \cdot f(x))=O(f(x))$.

Example 1

Show that $f\left(x\right)=2x^2 +4x$ is $O(x^2)$

Solution

While intuitively we may understand that the dominant term for large $x$ is $x^2$ so that $f(x) = O\left(x^2\right)$, we show this formally by producing as witnesses $A=3$ and $n =4$ with reference to the following graph.

Example 2

Show that $f(x) =2x^3 +3x$ is $O(x^3)$, with $A=3$ and $n=2$. Support your answer graphically.

Solution

Notice that $ x^3 > 3x$ when $ x \geq 2$. This means $2x^3 +x^3 > 2x^3 +3x $ when $x >2 $. In other words $ 3x^3 > 2x^3 +3x$ whenever $ x>2$, confirming $A=3$ and $n=2$ as witnesses, and supported by the following graph.

To show that a function $ f(x)$ is not $O(g(x))$, means that no $A$ can scale $g(x)$ so that $ Ag(x) \geq f(x)$ for $x$ large enough as in the following example.

Example 3

Show that $ f(x) = x^2$ is not $ O( \sqrt{x})$.

Solution

Consider the graphs of $ \sqrt{x}$, $ 2 \sqrt{x}$, $ 3\sqrt{x}$, and the graph of $x^2$. Notice that eventually, or for $x$ large enough, $x^2$ is larger than any $A \sqrt{x}$ as in the figure below

Suppose $A>1$ is given and fixed, then if $ f(x) = x^2$ is $ O(g(x))=O( \sqrt{x})$ , there is a corresponding $n$, also fixed, for which $A \sqrt{x} \geq x^2$ whenever $x>n$.

We solve the inequality $A \sqrt{x} ≥ x^2$ by dividing both sides by $\sqrt{x} =x^{1/2}$, to obtain, $A \sqrt{x} ≥ x^{3/2}$.

But $A$ is fixed and cannot be greater than all arbitrarily large $ x^{3/2}$. Hence no such $n$ can exist for a given fixed $A$.

For example, consider $g(x)=A \sqrt{x}$ and $ f(x) =x^2 $, when $ x= A^2$ we obtain $ g(A^2) = A \sqrt{(A^2)}= A^2$ and $ f(A^2) = {\left ( {A}^2 \right )}^2$ and $ f(A^2)= A^4 > A^2 = g(A^2) $ when $A>1$.

12.4. Properties of Big O notation.

Suppose $f(x)$ is $O(F(x))$ and $g(x)$ is $O(G(x))$.

Properties of Big O Notation

$c \cdot f(x)$ is $O(F(x))$
$ f (x )+g(x)$ is $O(\max \left ( F(x), G(x) \right )$
$ f (x ) \cdot g(x))$ is $O(F(x) \cdot G(x))$

We can use these properties to show for instance $ 2x^2$ is $O\left(x^2\right)$. Likewise if $f(x) =2x^2$ and $g(x) =4x$, then $ 2x^2$ is $O(x^2)$ and $ 4x$ is $O(x)$, and the maximum gives that $2x^2+4x$ is $ O(\max(x^2, x)) =O(x^2)$.

It is true in general that if a polynomial $f(x)$ has degree $n$ then $f(x)$ is $O(x^n)$.

Big O for Polynomials

$p(x)=a_nx^n +a_{n-1}x^{n-1} +a_{n-2}x^{n-2}+\ldots +a_2x^2 +a_1x^1+a_0$ is $O(x^n)$

For example, if $f(x)= x^3+1$ being $ O(x^3)$, and $g(x)=x^2-x$ being $O(x^2)$, then $f(x) \cdot g(x)$ is $O(x^3 \cdot x^2) =O(x^5)$. This is verified explicitly by multiplying $f(x) \cdot g(x)= (x^3+1) \cdot (x^2-x)= x^5 -x^4+x^2-x $ which clearly is $O(x^5)$

Example 4 - ordering by growth

Order the following functions by growth: $n⋅\log_2⁡ n$ , $n^2$, $n^{4/3}$

Solution

Recall the ordering,

$\log_2⁡ n$, $n^{1/3}$, and $n$,

which is ordered by logarithmic, then radical, and then polynomial (or linear) growth.

Notice also, that multiplying each by $n$, preserves the order.

$n⋅\log_{2⁡}n=n\times \log_{2⁡}n$

$n^{4/3} =n \times n^{1/3}$

$n^2=n \times n$

The using the original ordering, $\log{n}$, $n^{1/3}$, $n$, we obtain also the following ordering $n⋅\log n$, $n^{4/3}$, $n^2$.

As a final example we consider ordering three functions by growth using the basic properties for Big O and the basic orderings.

Example 5

Find the Big O of each of the following and then rank by Big $O$ growth:

$f\left(x\right)=\left({3x}^3+x\right)2^x+\left(x+x!\right)x^4$

$g\left(x\right)=x^x(2^x+x^2)$

$h\left(x\right)=5x!+4x^3\log{x}$

Solution

First consider $f\left(x\right)$ and using the polynomial property observe that $\left({3x}^3+x\right)$ is $O(x^3)$. Using the multiplicative property, conclude that $\left({3x}^3+x\right)2^x$ is $O(x^32^x)$. Likewise using the sum property, $\left(x+x!\right)$ is $O\left(\max{\left(x,x!\right)}\right)= O (x!)$. Then using the multiplicative property, $\left(x+x!\right)x^4$ is $O (x^4x!)$. Then $f\left(x\right)=\left({3x}^3+x\right)2^x+\left(x+x!\right)x^4$ is $O\left(\max{\left(x^32^x,x^4x!\right)}\right)=O\left(x^4x!\right)$.

For $g(x)$, notice using the maximum property for the sum, that $2^x+x^2$ is $O(2^x)$. Then using the multiplicative property, $x^x(2^x+x^2)$ is $O(2^xx^x)$.

For $h\left(x\right)$, we want $O\left(\max{\left(x!,\ x^3\log{x}\right)}\right)=O(x!)$. Notice here, that $4x^3\log{x}$ is $O(x^4)$, and $x^4$ has smaller asymptotic growth than $x!$. In fact, $x^4$ is $O(x!)$.

So, $f(x)$ is $O\left(x^4x!\right)$, and $g(x)$ is $O\left(2^xx^x\right)$. Also, $h(x)$ is, $O\left(x!\right)$.

We conclude that from an ordering perspective, we have by increasing growth order, $h(x)$, $f(x)$, and $g(x)$. To convince yourself that $g(x)$ grows faster than $f(x)$, use the facts that $2^x$ grows faster than $x^4$, and $x^x$ grows faster than $x!$.

12.5. Using Limits to Compare the Growth of Two Functions (CALCULUS I REQUIRED!)

In general, the Remix avoids using calculus methods because calculus is part of continuous mathematics, not discrete mathematics. However, it can be useful to use calculus to compare the growth of two functions $f(x)$ and $g(x)$ that are defined for real numbers $x$, are differentiable functions on the interval $(0,\, \infty)$, and satisfy $\lim_{x \to \infty} f(x) = \lim_{x \to \infty} g(x) = \infty$. To avoid needing to use the absolute value, we can assume that $0 < f(x)$ and $0 < g(x)$ for all $x \geq 0$ (This assumption is safe to make since both functions go to infinity as $x$ increases without bound, which means that both functions are positive for all $x$ values greater than or equal to some number $x_{0}$… we are just assuming that $x_{0}=0$ which is the equivalent of shifting the plots of $f$ and $g$ to the left by $x_0$ units.)

If $f(x)$ and $g(x)$ are such functions and $\lim_{x \to \infty} \frac{f(x)}{g(x)} = L$, where $0 \leq L < \infty$, then $f(x)$ is $O(g(x)),$ and if $0 < L < \infty$ then $f(x)$ is $\Theta(g(x)).$

To see this, recall that $\lim_{x \to \infty} \frac{f(x)}{g(x)} = L$ means that we can make the value of $\frac{f(x)}{g(x)}$ be as close to $L$ as we want by choosing $x$ values that are sufficiently large. In particular, we can make $L-\frac{L}{2} < \frac{f(x)}{g(x)} < L+\frac{L}{2}$ be true for all $x$ greater than some real number $x_{0}$. Now we can use the earlier stated assumption that $0 \leq g(x)$ to rewrite the inequality as $(L-\frac{L}{2}) \cdot g(x) < f(x) < (L+\frac{L}{2}) \cdot g(x)$, which is true for all $x >x_{0}$. We can choose for our witnesses $B = L + \frac{L}{2}$ and $x_{0}.$ This means that $f(x) < B \cdot g(x)$ whenever $x > x_{0},$ which shows that $f(x)$ is $O(g(x))$. Furthermore, if $L>0$ we can choose $A = L - \frac{L}{2}$ as a witness for the lower bound, too, which means that $ A \cdot g(x) < f(x) < B \cdot g(x)$ whenever $x > x_{0},$ so $f(x)$ is $\Theta(g(x))$.

Note that using this method does not focus on determining the actual numerical values of $A$ and $n$ but just guarantees that the witnesses exist, which is all that is needed to show that $f(x)$ is $O(g(x))$.

Example 6

Show that $100,000 n + n \cdot log (n)$ is $O(n \cdot log (n))$.

Solution

Notice that the expressions $100,000 x + x \cdot log (x)$ and $x \cdot log (x)$ can be used to define differentiable functions on the interval $(0,\, \infty)$. We changed the variable from $x$ to $n$ to stress that we are treating the variable as a real number in this example. Also, we will assume that $log (x)$ is the natural logarithm; as mentioned earlier, any other base for the logarithm results in a constant multiple of the natural logarithm and will not effect the Big-$O$ computations.

Let $f(x) = 100,000 x + x \cdot log (x)$ and $g(x) = x \cdot log (x)$. It is easy to see that $\lim_{x \to \infty} f(x) = \lim_{x \to \infty} g(x) = \infty$.

Now let’s compute $\lim_{x \to \infty} \frac{f(x)}{g(x)}$, that is, $\lim_{x \to \infty} \frac{100,000 x + x \cdot log(x)}{x \cdot log (x)}$. Direct computation gives the indeterminate form $\frac{\infty}{\infty}$, so we can use L’Hôpital’s rule to write $\lim_{x \to \infty} \frac{100,000 x + x \cdot log(x)}{x \cdot log(x)} = \lim_{x \to \infty} \frac{100,000 + (1 \cdot log (x) + x \cdot \frac{1}{x})}{1 \cdot log(x) + x \cdot \frac{1}{x}} = \lim_{x \to \infty} \frac{100,000 + log (x) + 1}{log(x) + 1}$. This limit still gives us an indeterminate form if we try to directly find the limits of the numerator and denominator separately without some simplification, but we can divide both numerator and denominator by $log (x)$ to rewrite the last limit as the equivalent limit $\lim_{x \to \infty} \frac{\frac{100,001}{log (x)} + 1}{1 + \frac{1}{log(x)}} = \frac{0+1}{1+0} = 1$. Since the limit is a positive finite number, $100,000 x + x \cdot log (x)$ is $\Theta(x \cdot log (x))$ which means that is also $O(x \cdot log (x)).$ As mentioned above, we do not need to find the actual values of the witnesses when using this limit method.

12.6. Exercises

Give Big O estimates for
1. $f\left(x\right)=4$
2. $f\left(x\right)=3x-2$
3. $f\left(x\right)=5x^6-4x^3+1$
4. $f\left(x\right)=2\ \ \sqrt x+5$
5. $f\left(x\right)=x^5+4^x$
6. $f\left(x\right)=x\log{x}+3x^2$
7. $f\left(x\right)=5{x^2e}^x+4x!$
8. $f\left(x\right)=\displaystyle \frac{x^6}{x^2+1}$ (Hint: Use long division.)
Give Big O estimates for
1. $f\left(x\right)=2^5$
2. $f\left(x\right)=5x-2$
3. $f\left(x\right)=5x^8-4x^6+x^3$
4. $f\left(x\right)=$ \$4 root(3)(x)+3\$
5. $f\left(x\right)=3^x+4^x$
6. $f\left(x\right)=x^2\log{x}+5x^3$
7. $f\left(x\right)=5{x^610}^x+4x!$
8. $f\left(x\right)=\displaystyle \frac{x^5+2x^4-x+2}{x+2}$ (Hint: Use long division.)
Show, using the definition, that $f\left(x\right)=3x^2+5x$ is $O(x^2)$ with $A=4$ and $n=5$. Support your answer graphically.
Show, using the definition, that $f\left(x\right)=x^2+6x+2$ is $O(x^2)$ with $A=3$ and $n=6$. Support your answer graphically.
Show, using the definition, that $f\left(x\right)=2x^3+6x^2+3$ is $O(x^2)$. State witnesses $A$ and $n$, and support your answer graphically.
Show, using the definition, that $f\left(x\right)=\ {3x}^3+10x^2+1000$ is $O(x^2)$. State the witnesses $A$ and $n$, and support your answer graphically.
Show that $f\left(x\right)=\sqrt x$ is $O\left(x^3\right)$, but $g\left(x\right)=x^3$ is not$\ O(\ \sqrt x)$.
Show that $f\left(x\right)= x^2$ is $O\left(x^3\right)$, but $g\left(x\right)=x^3$ is not$\ O( x^2)$.
Show that $f\left(x\right)=\sqrt x$ is $O\left(x\right)$, but $g\left(x\right)=x$ is not$\ O(\ \sqrt x)$.
Show that $f\left(x\right)=$ \$root(3)(x)\$ is $O\left(x^2\right)$, but $g\left(x\right)=x^2$ is not \$O( root(3)(x))\$
Show that $f\left(x\right)=$ \$root(3)(x)\$ is $O\left(x\right)$, but $g\left(x\right)=x$ is not \$root(3)(x)\$.
Order the following functions by growth $x^\frac{7}{3},\ e^x,\ 2^x,\ x^5,\ 5x+3,\ 10x^2+5x+2,\ x^3,\log{x,\ x^3\log{x}}$
Order the following functions by growth from slowest to fastest. $\ 3x!,\ {10}^x,\ x\cdot\log{x},\ \log{x\cdot\log{x,\ \ }2x^2+5x+1,\ \pi^x,x^\frac{3}{2}\ },\ 4^5,\ \ \sqrt{x\ }\cdot\log{x}$
Consider the functions $f\left(x\right)=2^x+2x^3+e^x\log{x}$ and $g\left(x\right)=\sqrt x+x\log{x}$. Find the best big $O$ estimates of
1. $(f+g)(x)$
2. $(f\cdot\ g)(x)$
Consider the functions $f\left(x\right)=2x+3x^3+5\log{x}$ and $g\left(x\right)=\sqrt x+x^2\log{x}$. Find the best big $O$ estimates of
1. $(f+g)(x)$
2. $(f\cdot\ g)(x)$
State the definition of "$ f(x)$ is $ O(g(x))$"" using logical quantifiers and witnesses $A$ and $n$.
Negate the definition of "$ f(x)$ is $ O(g(x))$" using logical quantifiers, and then state in words what it means that $ f(x)$ is not $ O(g(x))$.

13. Algorithms and Their Analysis

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on November 16, 2025.
added closed-form solution for the recurrence relation that describes the number of comparisons used by the MergeSort algorithm for lists of length a power of 2.

An algorithm is a step-by-step process, defined by a set of instructions to be executed sequentially, that is used to compute a value or solve a problem. Algorithms are used in discrete mathematics to perform numerical computations, to sort lists, to do searches, insertions, and deletions of items in data structures, to solve optimization problems, and more.
The word algorithm is derived from the name of Abū ‘Abd Allāh Muḥammad ibn Mūsā al-Khwārizmī. Western Europeans learned how to do arithmetic with Hindu-Arabic numerals and the base-ten place-value system from a 12th-century AD Latin translation of one of al-Khwārizmī’s books (The Latin translation was done about 300 years after al-Khwārizmī wrote the original book in Arabic; the original is now lost.) By the way, the word algebra is derived from the title of another one of Al-Khwārizmī’s books.

In this chapter, several algorithms are implemented in Python using non-optimal code to illustrate all steps needed. As mentioned in the About this text chapter, this was a deliberate choice because the code examples are designed for teaching the mathematical concepts and for neither presenting the most efficient implementations nor illustrating the "Pythonic" way of coding. However, a few code examples in this chapter do show how you can complete the same task by using some of Python’s built-in methods.

Key terms and concepts covered in this chapter:

Algorithm
- Properties of an algorithm
Types of algorithm
- Arithmetic
- Search
- Sorting

13.1. What Is An Algorithm? What Is Not An Algorithm?

There are several problem-solving strategies that humans use, but not all of these strategies are algorithms.

An algorithm should have the following properties, based on Donald E. Knuth’s description in his famous Art of Computer Programming, The: Volume 1: Fundamental Algorithms, 3rd Edition:

Definiteness	Each step must be precisely defined. The actions taken at each step must be unambiguously specified.
Effectiveness	Each step must be so basic that it can be done exactly and in a finite length of time.
Finiteness	The process must terminate after a finite number of steps (although the number of steps may be a very large natural number.)
Input	Zero or more input quantities are taken from a specified set. Inputs can be introduced before the initial step of the algorithm or dynamically before any later step of the algorithm.
Output	The algorithm produces one or more outputs that have a specified relation to the inputs. Furthermore, the algorithm should produce the correct outputs for each set of inputs.

13.1.1. An Example: Finding the Minimum in a List of Integers

Consider the following algorithm for finding the minimum element in a finite sequence of integers.

Task: Given a finite list of integers, find then minimum value in the list.
Input: A finite list of integers L.
Steps:
1. Define a variable $min$ and set its value to the initial value in the list.
2. While there are values in the list that have not been examined,
  1. Increase the index by 1
  2. If the value of the element at the current index is less than the value of $min,$ set the value of $min$ to the value of the element at the current index.
Output: The value of $min.$

Does this satisfy all the requirements for an algorithm? Each step is precisely defined and can be done exactly and in a finite length of time. The process terminates after a finite number of steps (when $min$ has been compared to the element at the highest index of the list.) The input and output are specified. This process satisfies the requirements for an algorithm.

A Python implementation of this algorithm is given below.

Example 1 - Minimum in Python

The Python code below uses a while loop to implement an algorithm that finds the minimum in a list of integers.

Edit in PythonTutor

Note that Python provides a built-in function to find the minimum of a list: The minimum value of the list L can be computed using min(L).

13.2. Arithmetic Algorithms

Historically, algorithms first arose as ways to solve arithmetic problems. In this section, algorithms for such operations are presented.

13.2.1. Division By Repeated Subtraction

A simple division algorithm is presented in this subsection.

Example 2 - Division of Integers by Repeated Subtraction

The code below implements integer division of the positive integer a by the positive integer b. This algorithm is very ancient, dating back to at least Euclid’s Elements.

Click on the "Next" button to step through the code.

Edit in PythonTutor

Big-O Analysis Of Division By Repeated Subtraction

Notice that the "division by repeated subtraction" algorithm has a worst-case scenario when the divisor b equals 1. In this worst-case scenario you must subtract the divisor $b=1$ exactly $a$ times to exit the loop, so "division by repeated subtraction" is $O(a),$ that is, of at least linear order in the larger number $a.$

13.2.2. Long Division Of Natural Numbers

You can revise the previous algorithm to work faster by using place-value thinking.

Example 3 - Long Division

The code below implements integer division of the integer a by the positive integer b.

Click on the "Next" button to step through the code.

Edit in PythonTutor

This long division algorithm uses powers of two (instead of ten) but otherwise is like the standard algorithm, using decimal notation, that you may have learned in school.
Notice that the shift operators << and >> multiply or divide by 2 (>> gives only the quotient, without keeping track of the remainder.)

Big-O Analysis Of Long Division

The worst-case scenario corresponds to the divisor $b=1,$ which requires you to shift as many times as there are binary digits in $a.$ Notice that $a$ is bounded above by 2 raised to the number of binary digits; this means that the long division algorithm is $O(\log (a)).$

Python provides built-in operators to find the quotient and remainder for variables a and b of type int: a//b is the quotient and a%b is the remainder. There is also the operator divmod(a,b) that returns an ordered pair (of data type tuple) containing both the quotient and the remainder, but divmod is less efficient than // and % and should only be used when you need to find both the quotient and the remainder.

13.2.3. Greatest Common Divisors: The Euclidean Algorithm

It is often needed to find a common divisor of two integers. This next algorithm goes back to Euclid.

Example 4 - The Euclidean Algorithm

The code below implements integer division for positive integers a and b.

Click on the "Next" button to step through the code.

Edit in PythonTutor

Big-O Analysis Of The Euclidean Algorithm

The Euclidean Algorithm takes two positive integer inputs $a$ and $b$ with $a > b.$ It can be proven using mathematical induction that the worst-case scenario for this algorithm is $O(\log (b)).$ The proof uses the closed-form for the Fibonacci numbers.

Python’s math module includes a function gcd that will compute the greatest common divisor of two or more integers.

13.3. Search Algorithms

In the first two subsections you will see two algorithms for searching for a target integer within a list of integers. In the third subsection, Python’s built-in search methods are discussed.

RECOMMENDATION: The "Algorithms and Recursive Functions" activity can replace one or both of the first two subsections.

13.3.1. The Linear Search Algorithm

Linear search compares a target integer, t, to each element in a list of distinct integers, starting at index 0, and returns either the index i at which the target integer was found or a value indicating that the target integer was not found in the list.

A Python implementation of the linear search algorithm is given below.

Example 5 - Linear Search Algorithm in Python

The Python code below uses a while loop to implement the linear search algorithm. The code prints either the index at which the target was found in the list or the built-in constant None to indicate that the target was not found.

Edit in PythonTutor

Why not use -1 to indicate that the target integer was not found?: Negative integers can be valid indices for a Python list! This is very different than other languages like Java in which indices must be natural numbers. As an example, for the list $L = \lbrack 2,4,7 \rbrack$ we have $L \lbrack -1 \rbrack = 7,$ $L \lbrack -2 \rbrack = 4,$ and $L \lbrack -3 \rbrack = 2.$ If you are coding in Python it may be safer either to raise an exception or to use the built-in constant None to indicate that no index for the target was found.

Big-O Analysis of Linear Search

The linear search algorithm iterates across a list of $n$ data elements. If the first element in the list is the target element, the algorithm stops. Otherwise, move to the next element and continue repeatedly until the target element is found or not. If the target element is not in the search list the algorithm exhaustively searches through every single element.

This is the worst case scenario with linear search in which the algorithm inspects every single element, either because the target element is the last element of the array, or the target element is not actually in the search list at all. The algorithm runs in $O(n)$ time in the worst case.

13.3.2. The Binary Search Algorithm

The binary search algorithm searches a sorted list L of integers for a target value t. The algorithm starts looking for t in the middle of the sorted list. If t is greater than the value in the middle, the algorithm continues the binary search in the upper half of the list, otherwise the algorithm continues the binary search in the lower half of the list. The algorithm continues in this way until we reach a list of length 1 that either does or does not have t as its only element.

Example 6 - Binary Search Algorithm in Python

Edit in PythonTutor

Big-O Analysis of Binary Search

The binary search algorithm searches for a target element $x$ in a list of $n$ elements by comparing the middle element in the the sorted data set with the target $x$. The algorithm stops if the middle element $a_m$ is the target element. Otherwise the search continues with half the data set—the half to the left if the middle element is larger than the target $x$ or the half to the right if the middle element is smaller than the target.

The number of steps in the binary search then is the number of times we have to split the data set until we locate the target element, or determine that the target element is not in the search list after splitting down to 1 element.

The number of times we need to split the data set of size $n$, in the worst case then, is $p$ which is found by solving the exponential equation,

$2^p = n$.

The algorithm then is $O(p)$.

The solution of the exponential equation, $2^p = n$, is in log form,

$p=\log_2{n}$.

The binary search algorithm then is $O(p)=O(\log{n})$.

13.3.3. Searching Within a List using Python

In this subsection you will see how to use Python to efficiently search lists.

Example 7 - Searching a List in Python

You can search for the index of a target value in a Python list by calling the list.index(x) method. This method returns the least natural number index of the target value if it is found in the list, otherwise it raises a ValueError.

Edit in PythonTutor

If you need to know whether the target value x is in the list s but do not need the least index, you can use x in s which returns a Boolean.

Edit in PythonTutor

If you need to know how many times the target value x occurs in the list s you can use the list.count(x) method.

Edit in PythonTutor

13.4. Sorting Algorithms

In this section you will see three algorithms for sorting a list of real numbers. Two of these algorithms, bubble sort and insertion sort, are inefficient but are presented here as in many other textbooks because they are easy to understand and analyze. The third algorithm, merge sort, is an efficient recursive algorithm.

In Python, the elements of a list L can be sorted into increasing order by calling the list.sort() method. This built-in method uses one of two sorting algorithms that won’t be discussed in this textbook: the Timsort algorithm in Python versions 2.3 to 3.10, or the Powersort algorithm in Python versions 3.11 to 3.12 (the current version as of this writing).

Python built-in sort() method

The code below uses Python’s built-in sort() method.

Edit in PythonTutor

13.4.1. Bubble Sort

The bubble sort algorithm is a simple sorting procedure. It is typically used to sort a list of n data elements in either increasing or decreasing order.

NOTE: This algorithm is called "bubble sort" because "the lighter items bubble up to the top" of the list, closer to index 0, like bubbles in a drink.

WARNING: The bubble sort algorithm produces the correct result but is very inefficient. You should almost never use bubble sort in code that you write. In almost every application that requires sorting a list, there is an algorithm that can be used that is much more efficient than bubble sort. You have been warned!
In fact, most modern programming languages have built-in sort methods for you to use; these built-in methods are implemented using efficient algorithms.

We describe the bubble sort algorithm for arranging a list of $n$ real numbers in increasing order.

The algorithm compares the first two elements of the list and swaps them if they are out of order.
It continues by traversing the list in order of increasing index, comparing each pair of adjacent elements and swapping them if they are out of order until we reach the last entry in the list at index $n-1$.
The last entry in the list will then be the largest element of the original list.
After the largest element has been sorted into position $n-1$, the algorithm continues by again comparing the first two elements and swapping if they are out of order.
Continue traversing the list and comparing and swapping adjacent elements that are out of order until position $n-2$ of the array, after which the 2nd largest element is at index $n-2$. The elements, now at indices $n-1$ and $n-2$ are sorted.
Continue to sort at indices $n-3,$ then $n-4,$ and so on, until all elements are in increasing order.

A Python implementation of the bubble sort algorithm is given below.

Python implementation of the Bubble Sort Algorithm

The code below uses two nested while loops to implement the Bubble Sort algorithm.

Edit in PythonTutor

Big-O Analysis of Bubble Sort

We analyze the bubble sort algorithm beginning with a concrete list of size $n=5$ and generalize the analysis.

Consider the case of a list of size $n=5$. The naive bubble sort algorithm in this case will involve 4 passes.

In the first pass, there will be 4 comparisons and up to 4 swaps, after which the element in position 5 is in its correct position.

In the second pass, there will be 3 comparisons and up to 3 swaps, after which the element in position 4 is in its correct position.

In the third pass, there will be 2 comparisons and up to 2 swaps, after which the element in position 3 is in its correct position.

In the fourth pass, there will be 1 comparison and one possible swap , after which both the elements in positions 1 and 2 are both in their correct positions.

Adding the comparisons from each pass we obtain,

$4+3+2+1=1+2+3+4$.

In general, if the list is of size $n$, there will be $n-1$ passes with swaps,

$(n-1)+(n-2)+...+2+1 = 1+2+...+(n-2)+(n-1)$.

You can use mathematical induction to prove that \[1+2+\cdots+(n-2)+(n-1)= \frac{(n-1)\cdot n}{2}\] and since $\frac{(n-1)\cdot n}{2} =\frac{1}{2}n^2-\frac{1}{2}n$, the bubble sort algorithm is $O(n^2)$.

13.4.2. Insertion Sort

The insertion sort works through a list and classifies two sections as sorted and unsorted.

The insertion sort scans through each element of the list using an outer loop with a variable, say $i$.
At each stage, the list is divided into a sorted section, say the left section, and a section that is not sorted, say the right.
The location up to which the list is sorted, is denoted by a pointer or index, called a key.
At the current stage, the next element from the unsorted section, on the right, is inserted into its appropriate position in the sorted section on the left.
The process of inserting smaller elements in the left involves shifting, larger elements to the right, using a variable, say $j$.

A Python implementation of the Insertion Sort Algorithm is given below:

Example 8 - Insertion Sort in Python

The code below uses two nested while loops to implement the Insertion Sort algorithm.

Edit in PythonTutor

Big-O Analysis of Insertion Sort

It is left as an exercise to verify that the insertion sort algorithm is $O(n^2)$.

13.4.3. Merge Sort

Merge Sort is a recursive sorting algorithm. The general idea is to divide a list of length $n$ into $n$ sublists of length $1$, where a list of length one is considered already sorted. Subsequently, the algorithm repeatedly merges the sublists to make sorted lists until only a single list of length $n$ is remaining. This will be the a sorted list consisting of the elements of the original list.

The picture below illustrates this with a list of length seven.
Image credit: "Merge Sort Algorithm Diagram" by VineetKumar. This work has been released into the public domain by its author, VineetKumar at English Wikipedia. This applies worldwide.

Example 9 - The Merge Sort Algorithm in Python

Edit in PythonTutor

Big-O Analysis of Merge Sort

Define a function $C(n)$ with output equal to the number of comparisons needed by Merge Sort to order a list of size $n$. From the algorithm, you can see that $C(n)$ satisfies the following recurrence relation. \[ C(n) = C \left( \left\lceil \frac{n}{2} \right\rceil \right) + C \left( \left\lfloor \frac{n}{2} \right\rfloor \right) + (n - 1) \] with $C(1) = 0$ and $C(2) = 1.$

To simplify the analysis, let’s assume that $n$ is a power of $2$ so that the ceiling and floor are not needed in the recurrence relation. \[ C(n) = C \left( \frac{n}{2} \right) + C \left( \frac{n}{2} \right) + (n - 1) \] where $n$ is a power of $2$ and $C(1) = 0$ and $C(2) = 1.$ Notice that the recursive step can be rewritten as \[ C(n) = 2 C \left( \frac{n}{2} \right) + (n - 1). \] It can be shown that the closed-form solution to this recurrence relation is a function that is $O(n \log{n}).$
Informal exercise: Verify that $C(n) = n \cdot \log_{2}{(n)} - (n-1)$ is a closed-form solution of the simplified recurrence relation.

MORE TO COME!

14. Graphs

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on November 29, 2025.
Added content to subsection on the Traveling Salesperson Problem.
Revised example involving isomophism and planarity.
Revised and expanded the discussion of isomorphisms and underlying graphs.

Graphs are discrete mathematical structures used to represent connections between individual Graphs have applications in fields such as chemistry, network analysis, computing algorithms, and social sciences.

Key terms and concepts covered in this chapter:

Graphs
- Undirected graphs
- Directed graphs
- Weighted graphs
Creating new graphs from old graphs
- Subgraphs
- Unions and intersections of graphs
Graph isomorphism
Graph coloring
Connectivity of graphs
- Eulerian circuits
- Hamiltonian circuits
- Shortest-path problems
  - Dijkstra’s algorithm
  - Traveling Salesperson Problem (TSP)

14.1. Introduction and Definitions

A graph consists of a set of vertices and a set of edges. Each edge connects either two different vertices or one vertex to itself. Vertices are sometimes called nodes.

For each edge, its endpoints are the two vertices that it connects (but notice that the “endpoints” may be the same vertex.) The edge is said to be incident with each of its endpoints.
An edge that connects one vertex to itself is called a loop.
Two edges that have the same endpoints are called parallel edges.
Two vertices are adjacent if they are the endpoints of at least one edge. Adjacent vertices are also called neighbors.
The degree of a vertex $v$ is the number of times the vertex occurs as an endpoint for all the edges in the graph.
An isolated vertex is a vertex that is not an endpoint of any edge. That is, an isolated vertex is the same as a vertex that has degree 0.
A graph is called a multigraph if there are parallel edges in the graph.
A simple graph is a graph that has no loops and no parallel edges.

The following example illustrates many of these definitions.

Example 1

The graph shown has 8 vertices and 7 edges.

In this drawing, the vertices are represented by points that are labeled by capital letters, and the edges are represented by the line segments and arcs that connect two vertices.

The graph is drawn so that the edge with endpoints $B$ and $F$ and the edge with endpoints $D$ and $E$ are represented by two line segments that intersect, but the point of intersection is ignored in the graph because it is not a vertex. We could redraw this graph so that the two edges do not cross; for example, we could move $E$ inside the triangle that has vertices $B,$ $D,$ and $F.$ However, there are some graphs which cannot be drawn in 2 dimensions without some edge crossings.

There is a loop: The edge that has $D$ as both of its endpoints.

There are parallel edges that connect the vertices $A$ and $C,$ so this graph is a multigraph.

Vertex $G$ is an isolated vertex - it is not an endpoint of any edge. Vertex $H$ is also an isolated vertex.

This graph is a multigraph because there are multiple edges that connect the pair of vertices $A$ and $C.$

This graph is not a simple graph: It contains a loop, and it has at least one pair of parallel edges.

Question

For this graph, which vertex has the greatest degree? Which vertex has the least degree?

Hint

Recall that the degree of a vertex is the number of times that vertex occurs as an endpoint (and that a loop has two endpoints that are the same vertex.)

Notice that in the example and definitions given so far, we’ve assumed that each edge is undirected: An edge represents a connection between its endpoints but does not indicate a direction of travel from one endpoint to the other endpoint. However, in some applications it is useful for a graph to use only directed edges that point from one endpoint to the other endpoint. A graph that has only directed edges is called a digraph, which is short for directed graph.

Most graphs discussed in this chapter will be (undirected) simple graphs, that is, graphs with no loops and no parallel edges.

14.2. The Handshaking Lemma

This section describes a property of every undirected graph.

Recall that the degree of a vertex of a graph is the number of times the vertex occurs as an endpoint for all the edges in the graph. Keep in mind that the vertex of a loop counts twice because that vertex occurs as “both endpoints” of the loop.

Example 2

Consider again the graph that was drawn in the previous example.

The degrees of each of the vertices in this graph are listed in the table.

Vertex	Degree
A	2
B	2
C	2
D	5
E	1
F	2
G	0
H	0

Notice that the sum of the degrees of all the vertices is $14$, which is equal to twice the number of edges, $2 \cdot 7.$

In fact, for any undirected graph with finitely many edges, the sum of the degrees of all vertices is equal to twice the number of edges (recalling that a loop’s endpoint will be counted twice.)

The Handshaking Lemma

The sum of the degrees of the vertices of a graph is equal to twice the number of edges of the graph.

Proof

This is an exercise for you. Use mathematical induction on the predicate
$P(n):$ “If a graph has $n$ edges, then the sum of the degrees of all vertices of the graph is equal to $2n.$”

A useful consequence of the Handshaking Lemma is that the sum of the degrees of a graph must be even.

14.3. Simple Graphs

Recall that in a simple graph, there are no loops and no parallel edges (that is, you cannot have two different edges that connect the same pair of endpoints.) This means that for a simple graph, each edge is determined by its two distinct endpoints. This allows us to give a relatively simple but formal set-theoretic definition of “simple graph.” Graphs discussed in this textbook are assumed to be simple unless stated otherwise.

14.3.1. A Formal Definition of Simple Graph

This subsection presents a formal set-theoretic definition of simple graphs.

A simple graph $G=\left(V,\ E\right)$ is an ordered pair consisting of a nonempty set $V$ and a (possibly empty) set $E,$ where each element of $E$ must be of the form $\left\{x,y\right\}$, where $x$ and $y$ are two different elements of $V.$ The elements in set $V$ are called the vertices (or nodes) of the graph. The elements in set $E$ are called the edges of the graph.

Example 3 - A simple graph.

The graph shown has vertex set $\left\{A,\ B,\ C,\ D,\ E,\ F\right\}$ and edge set $ \{\{A,C\},\{A,D\},\{B,D\},\{B,F\},\{C,F\},\{D,F\},\{E,F\}\}.$

The degrees of each of the vertices in the undirected graph $G$ with vertex set $V=\{A,B,C,D,E,F,G\}$ and edge set $E=\{\{A,C\},\{A,D\},\{B,D\}\{B,F\},\{C,F\},\{D,F\},\{F,G\}\}$ are,

$d\left(A\right)=2$

$d\left(B\right)=2$

$d\left(C\right)=2$

$d\left(D\right)=3$

$d\left(E\right)=0$

$d\left(F\right)=4$

$d\left(G\right)=1$

14.4. Directed Graphs

The main focus of this chapter will be undirected simple graphs, but we will briefly discuss directed graphs in this section.

A directed graph (or digraph) is a graph in which the edges are directed from one vertex to another vertex. Each edge has an initial vertex $u$ and a terminal vertex $v;$ the edge is drawn as an arrow pointing from $u$ to $v.$

The out-degree of a vertex $w$ is the number of edges that have $w$ as the initial vertex. The in-degree of a vertex $w$ is the number of edges that have $W$ as the terminal vertex.

Example 4 - A directed graph and its underlying graph.

The directed graph shown has vertex set $\{A,B,C,D,E,F\}$ and edge set $\{ (A,C),(D,A),(B,D),(F,B),(C,F),(D,F),(F,E) \}$. The first coordinate of each edge is the initial vertex and the second coordinate is the terminal vertex.

Notice that the directed edges of this graph connect the same endpoints as the undirected edges of the graph in this earlier example. The undirected graph is referred to as the underlying graph of the directed graph.

Example 5 - The game "rock, paper, scissors"

The graph $G=(V,E)$ with vertex set $V = \{ \text{"rock", "paper", "scissors"} \}$ and edge set $E = \{ \text{("rock", "paper"), ("paper", "scissors"), ("scissors", "rock")} \}$ can be used to represent the game "rock, paper, scissors."

Each directed edge has for its initial vertex the loser and for its terminal edge the winner.

14.4.1. Simple Directed Graphs: A Formal Definition

We can give a formal set-theoretic definition of simple directed graph as well. To indicate the directed edges, ordered pairs of vertices are used instead of 2-element sets.

A simple directed graph $G=\left(V,\ E\right)$ is an ordered pair consisting of a set $V$ of objects called vertices (or nodes) and a set $E$ of objects called directed edges. Each directed edge $e\in\ E$ is an ordered pair of the form $e=\left(x,y\right)$, where $x$ and $y$ are two different vertices in set $V.$ For the directed edge $e=\left(x,y\right),$ $x$ is the initial vertex of $e$ and $y$ is the terminal vertex of edge $e$.

14.5. Examples of Simple Graphs

In this section presents several classes of graphs.

The complete graph $K_n$ is the simple graph with $n$ vertices such that any two vertices are adjacent, that is, every pair of vertices are the endpoints of an edge. The image shows $K_{4},$ the complete graph on 4 vertices. Click here to see images of $K_{n}$ for the positive integers that are less than or equal to $12.$

The n-cube $Q_{n}$ can be described as the graph that has vertex set consisting of the $2^{n}$ bitstrings of length $n,$ and edges such that two vertices are adjacent if and only if the bitstrings differ in exactly one bit position. The image shows the three graphs $Q_{1},$ $Q_{2},$ and $Q_{3};$ these graphs can be used as a way to represent the power sets of sets that have $1,$ $2,$ and $3$ elements, respectively. Notice that $Q_{2}$ can be drawn as a square and that $Q_{3}$ can be represented as a cube in $3$-dimensional space (or by a drawing of a cube in a $2$-dimensional plane.)

A bipartite graph is a simple graph whose set of vertices can be partitioned into two disjoint nonempty sets such that every vertex is in exactly one of the two sets and every edge has one endpoint in each of the two sets. One way to think of a bipartite graph is that each vertex can be assigned one of two colors so that every edge must connect vertices of different colors. Notice that $Q_{1},$ $Q_{2},$ and $Q_{3}$ are all examples of bipartite graphs (Question: Is $Q_{n}$ a bipartite graph for every natural number $n?$ Why or why not?)

This image shows the graph $K_{2,3}$ and is another example of a bipartite graph. Notice that $K_{2,3}$ has an additional property: Every pair of vertices $\{a, b \}$ with $a$ in the set of $2$ "upper" vertices and $b$ in the set of $3$ "lower" vertices are the endpoints of an edge. A bipartite graph that has this additional property is called a complete bipartite graph. In general, the symbol $K_{m,n}$ represents the complete bipartite graph that has two disjoint sets of vertices, one of cardinality $|m|$ and the other of cardinality $|n|,$ such that every pair of vertices that come from the different sets are joined by an edge. Notice that $Q_{1} = K_{1,1}$ and $Q_{2} = K_{2,2}$ are complete bipartite graphs, but that $Q_{3}$ is not a complete bipartite graph because, for example, there is no edge joining $000$ and $111.$
NOTE: The phrase "complete bipartite" needs to be read as a single term used to indicate that a bipartite graph has all the edges it can possibly have. For example, $K_{2,3}$ is a bipartite graph such that if you tried to enlarge it by inserting an additional edge into the graph, that edge would join either the $2$ "upper" vertices, $2$ of the "lower" vertices, or $2$ vertices that are already joined; in this sense, $K_{2,3}$ is "complete" as a bipartite graph. $K_{2,3}$ is not a "complete graph" in the sense of the earlier example in this section. In fact, since a "complete graph" must contain an edge for every pair of distinct vertices, the only graph that can be both a "complete graph" and a "complete bipartite graph" is $Q_{1} = K_{2} = K_{1,1}.$ Mathematicians recycle and reuse a lot of words… .

14.6. Representing Simple Graphs

In addition to the vertex-edge drawing, a simple graph can be represented in other ways that are more useful for computing.

First, recall that if $u$ is a vertex of a simple graph, then vertex $v$ is said to be adjacent to $u$ if and only if $\{u, v \}$ are the endpoints of an edge of the graph.

One way to represent a simple graph is by using an adjacency list. This list can be written as a table, where each row has two columns. In each row, the entry in the first column is a single vertex $v$ and the entry in the second column is a list of all vertices of the graph that are adjacent to $v.$

Another way to represent a simple graph is by using an adjacency matrix. The adjacency matrix of a simple graph represents the graph in table form, and contains an entry for each pair of vertices. For each vertex of the graph, there is a row and also a column. If vertices $u$ and $v$ are adjacent (that is, connected by some edge), then the adjacency matrix will contain a $1$ in the position that corresponds to the row for $u$ and the column for $v,$ otherwise the matrix contains a $0$ at that postion. The next example may help make this more clear.

Example 6 - Representing A Simple Graph

The graph with vertex set $\left\{A,\ B,\ C,\ D,\ E,\ F\right\}$ and edge set $\{\{A,C\},\{A,D\},\{B,D\}\{B,F\},\{C,F\},\{D,F\},\{E,F\}\}$ can be represented by

the drawing

or the adjacency list

Vertex

Adjacent Vertices

C, D

D, F

A, F

A, B, F

B, C, D, E

or the adjacency matrix

$\mathbf{M}=\left(\begin{matrix}0&0&1&1&0&0\\0&0&0&1&0&1\\1&0&0&0&0&1\\1&1&0&0&0&1\\0&0&0&0&0&1\\0&1&1&1&1&0\\\end{matrix}\right)$
For example, in matrix $\mathbf{M}$ the rows, from top to bottom correspond to the vertices $A,\ B,\ C,\ D,\ E,\ F$ and the columns, from left to right, corespond to vertices $A,\ B,\ C,\ D,\ E,\ F.$ The values in row 3, which corresponds to vertex $C$, indicate whether the vertex for that column is adjacent to $C.$ If we use the symbol $M_{r,c}$ to stand for the value in row $r$ and column $c,$ then $M_{3,5} = 0$ because there is no edge in the graph with endpoints $C$ and $E,$ and $M_{3,6} = 1$ because there is an edge in the graph with endpoints $C$ and $F$.

14.7. Weighted Graphs

In some applications, each edge of a graph has a weight, which is some nonnegative number. The weight could represent the physical distance between the two endpoint nodes, or could represent the cost to travel or transmit data between the endpoint nodes.

You can use an adjacency matrix to describe a weighted graph, but instead of using a $1$ to represent that there is an edge between two vertices you place the the weight of the edge in the correct position of the adjacency matrix, as shown in the following example.

Example 7 - Weighted Graph

Consider the following weighted simple graph

The adjacency matrix of this weighted graph is $ \left(\begin{matrix}0&2&5&0\\2&0&3&0\\5&3&0&1\\0&0&1&0\\\end{matrix}\right). $

14.8. Creating New Graphs From Old Graphs

Given a set of one or more graphs, there are several ways to create new graphs using the graphs in the set.

14.8.1. Subgraphs

Given a simple graph $G,$ you can form a subgraph $H$ by choosing a subset of the vertices of $G$ along with a subset of the edges of $G$ such that each edge has endpoints in the set of vertices you chose. That is, $H$ is a subgraph of $G$ if $H$ is a graph such that every vertex of $H$ is a vertex of $G$ and every edge of $H$ is an edge of $G.$
More formally, $H = (V_{H}, E_{H})$ is a subgraph of $G = (V,E)$ if and only if all three of the following statements are True: $V_{H} \subseteq V,$ $E_{H} \subseteq E,$ and for every edge $e \in E_{H}$ the endpoints of $e$ are in $V_{H}.$

If $v$ is a vertex of $G,$ we denote by $G-v$, the subgraph obtained from $G$ by removing the vertex $v$ along with all edges in $E$ that have $v$ as an endpoint.

The image shows a graph $G$, and the subgraph $G-d$ formed by removing the vertex $d$.

In the same way, you can obtain a subgraph by removing multiple vertices along with the edges associated with the removed vertices. The subgraph obtained is called the subgraph induced by removing those vertices.

Example 8

Below is a graph $G(V,E)$ and the subgraph obtained by $V-\{a,d\}$, called the induced subgraph $G-\{a,d\}$, with a slight abuse of notation

14.8.2. Unions and Intersections Of Graphs

Given two simple graphs $G_{1}$ and $G_{2}$, you can form the union of the graphs by taking the union of the two sets of vertices to get a new set of vertices, and taking the union of the two sets of edges to get a new set of edges. Notice that any edge that is in both graphs will only appear once in the new graph because you took the union of the sets of edges, that is, you can’t create parallel edges by forming the union.

In the same way, you can form the intersection of two simple graphs by taking the intersection of the two sets of vertices to get a new set of vertices, and taking the intersection of the two sets of edges to get a new set of edges.

14.9. Graph Isomorphism

Recall that a graph is determined by its set of vertices and how those vertices are connected by edges, but not the drawing you use to represent the graph.

Example 9 - The Same Graph Can Be Drawn In More Than One Way

Consider the two graphs shown in the image.

Notice that these two graphs are different-looking drawings of the same graph that has vertex set $\{ A, B, C, D\}$ and edge set $\{\{A,B\},\{A,C\},\{A,D\},\{B,C\},\{B,D\},\{C,D\}\}.$ Also, notice that the drawing on the left appeared earlier in the chapter, but with unlabeled vertices: Each of the graphs in the image is a drawing of $K_{4},$ the complete graph on $4$ vertices.

Notice that using either the adjacency list

Vertex

Adjacent Vertices

B, C, D

A, C, D

A, B, D

A, B, C

or the adajcency matrix \[\left(\begin{matrix}0&1&1&1\\1&0&1&1\\1&1&0&1\\1&1&1&0\\\end{matrix}\right)\] makes it easier to see that the two drawings represent the exact same graph.

You can imagine the graph on the right being the result of dragging the vertex $C$ inside the "triangle" with vertices $A,$ $B,$ and $D.$

Sometimes, different graphs may be essentially the same graph, as in the next example.

Example 10 - Two Graphs That Are Essentially The Same Graph

Consider the two graphs, each with $4$ vertices and $6$ edges, shown in the image.

These graphs are not equal since the graph on the left has vertex set $\{ A, B, C, D\}$ and the graph on the right has vertex set $\{ W, X, Y, Z\}.$ However, by comparing the graph on the right to the one on the right in the previous example, you can see that there is a one-to-one correspondence between the two sets of vertices that preserves adjacency (that is, if two vertices in the upper row are endpoints of an edge of the graph on the left, then the corresponding vertices in the lower row are endpoints of an edge of the graph on the right.)

A one-to-one correspondence between the set of vertices of two simple graphs that preserves adjacency is called a graph isomorphism, and the two graphs are said to be isomorphic. That is, two vertices are endpoints of an edge in the first graph if and only if the corresponding vertices are the endpoints of an edge in the second graph. Informally, you can think of two isomorphic graphs as a pair of graphs where a drawing of one graph can be relabeled and/or reshaped to obtain a drawing of the other graph (That is, the two graphs are really the same graph but have drawings that are labeled and/or shaped differently.)

Example 11 - Using Graph Isomorphism

Using graph isomorphisms can help identify properties of a graph.

The three graphs in the image are isomorphic; it is an exercise for you to write out the one-to-one correspondences.

You Try

Write out the one-to-one correspondences between the sets of vertices that define the graph isomorphisms.

Once you have shown that the three graphs are isomorphic, you can use the fact that they are different representations of the same graph. For example,

It is not immediately clear that the graphs on the left and right are bipartite, but the arrangement of the vertices in the middle graph into "upper" and "lower" rows makes this easy to see.
Also, it is not immediately clear that the graph in the middle or the graph on the right is planar (that is, the graph can be redrawn in a $2$-dimension plane so that no edges cross) but this is obvious for the graph on the left.

Challenge

Write out the adjacency matrix for each of the three graphs, using alphabetical order of the vertex labels, then identify a connection between the three adjacency matrices.

Hint

Look for rows and columns in the different matrices that are identical. The order of the rows and columns would change if you use non-alphabetical reorderings of vertices that correspond to the graph isomorphisms you wrote for the "You try" exercise above.

This textbook does not discuss planar graphs in detail, but it is worth mentioning that it can be proven that neither $K_{5}$ nor $K_{3,3}$ is planar. If you’d like to learn more about planar graphs, one source is the section "Planar Graphs" in Oscar Levin’s Discrete Mathematics: An Open Introduction, 4th edition.

14.9.1. Isomorphism of Graphs with Additional Features

As described above, a graph isomorphism is a one-to-one correspondence between the sets of vertices of two graphs that preserves the adjacency relationship between pairs of vertices. That is, two vertices are endpoints of an edge in one graph if and only if the corresponding vertices are endpoints of an edge in the other graph.

In cases where graphs have additional features beyond the basic adjacency relationship between pairs of vertices, we should only consider two of these “graphs with additional features” to be isomorphic if the additional features are also preserved.
Note: Some sources use the word “attributes” for these additional features.

As an example, in the image, the three graphs with vertex sets $\{ F,\,G,\,H \},$ $\{ J,\,K,\,L \},$ and $\{ P,\,Q,\,R \}$ are isomorphic as graphs if we ignore the edge weights. Also, the two graphs with vertex sets $\{ F,\,G,\,H \}$ and $\{ J,\,K,\,L \}$ are isomorphic as weighted graphs because there is a one-to-one correspondence between the vertex sets that preserves both the adjacency relationships and the corresponding edge weights. The third graph with vertex set $\{ P,\,Q,\,R \}$ is not isomorphic as a weighted graph to either of the other two weighted graphs because its edge weights cannot be matched with the edge weights in the other weighted graphs.

14.10. Graph Coloring

In some contexts, it can be useful to partition either the set of vertices or the set of edges of a graph into disjoint subsets to make it easier to understand the graph and the network it represents. This act of partitioning is usually referred to as "coloring" since using different colors can make it easy to see and interpret the properties of the partition when the graph is drawn. Notice that you could instead create the partition by assigning labels like "group 1," "group 2," and so on, to each vertex (or edge.)

For example, the image shows a graph called the Petersen graph with its vertex set partitioned into 3 subsets so that each edge’s endpoints are in two different subsets of the partition (That is, each edge’s endpoints have different colors.)
Image credit: "Petersen_graph_3-coloring.svg" by Д.Ильин. The copyright holder of this work has released this work into the public domain. This applies worldwide.

The next example discusses an application of vertex coloring.

Example 12 - Redrawing a Map as a Graph

The following image represents a "map" showing four countries; the blue region represents one country (not a body of water) that is surrounded by three other countries.

The map can be represented as a graph with vertices colored to match the regions, as shown on the right. If it helps you to connect the graph to the map, imagine that each vertex represents a capital city of the corresponding country.

This way of representing a map was used to prove the Four Color Theorem which states, roughly, that

Four Color Theorem

Any map of countries that can be drawn in a plane such that
(1) every country has a color and
(2) no two adjacent countries have the same color
requires at most four different colors.
In this context "two adjacent countries" share a border that is not just a single point.

The first proof of the theorem was announced in 1976, and a corrected version of the first proof was published in 1989 after some errors were fixed (Yes, professional mathematicians do make mistakes!) The proof was considered controversial by many mathematicians at the time because it was the first major computer-assisted proof: Over one thousand five hundred different cases needed to be checked!

In other contexts, it is more appropriate to use edge coloring. That is, each edge of the graph is assigned a color so that the set of edges is partitioned into disjoint subsets. For example, the graph in the image shows that the complete bipartite graph $K_{4,4}$ can be partitioned as a union of 3 disjoint graphs called forests (Forests are defined later in this textbook, in the Trees chapter.)
Image credit: "K44 arboricity.svg" by David Eppstein. The copyright holder of this work has released this work into the public domain. This applies worldwide.

14.11. Connectivity of Undirected Graphs

A walk on a graph $G=\left(V,E\right)$ is a finite, non-empty, alternating sequence of vertices and edges of the form, $v_0e_1v_1e_2\ldots e_nv_n$, with vertices $v_i\in V$ and edges $e_i\in E$, where for each integer value of $i \leq n$ the endpoints of $e_i$ are the vertices $v_{i-1}$ and $v_i.$ The integer $n$ is called the length of the walk.

If we restrict ourselves to simple undirected graphs, there is at most one edge joining each pair of adjacent vertices, so a walk can be specified simply by listing the sequence of vertices $v_0v_1\ldots v_n$ (That is, we don’t need to write down the edges.)

A trail is a walk that does not repeat an edge. That is, all edges in a trail are distinct.
A path is a trail that does not repeat a vertex (but we allow for the possibility that the initial vertex $v_0$ and terminal vertex $v_n$ of the path are the same vertex; When $v_0=v_n$ the path is called a closed path or a circuit.)
A cycle is a closed path of length at least 1.

The distance $d(u,v)$ between two vertices $u$ and $v$ in a graph $G$ is the number of edges in a shortest path connecting them, assuming such a path exists.

Example 13 - Trails, Paths, and Cycles

In the graphs below the first shows a trail $CFDBFE$. It is not a path since the vertex $F$ is repeated. The second shows a path $CADFB$, and the third a cycle $CADFC$. Also note the following distances, $d(A,D)=1$, while $d(A,F)=2$, and $d(A,E)=3$.

14.12. Connected Graphs

A graph $G$ is connected if there is a path between any pair of vertices.

Example 14 - A graph that is not connected

The graph $G$ below is not connected since, as just one example, there is no path between vertex $a$ and vertex $e.$

$G$ has adjacency matrix

$ \left(\begin{matrix}0&1&1&0&0\\1&0&1&0&0\\1&1&0&0&0\\0&0&0&0&1\\0&0&0&1&0\\\end{matrix}\right). $

In the previous example, the graph $G$ can be treated as a union of two connected subgraphs, called the connected components of $G.$ It can be proven by mathematical induction that any simple undirected graph that has a finite number of vertices can be written as a union of a finite number of connected components.

14.13. Eulerian Graphs

An Euler path on a graph is a path that uses each edge of the graph exactly once.

An Euler circuit (also called an Eulerian trail) is a closed trail containing each edge of the graph $G$ exactly once and returning to the start vertex. A graph with an Euler circuit is called Eulerian or is said to be an Eulerian graph.

In the following, the first graph is Eulerian. The sequence of edges $e_1 e_2 e_3 e_4 e_5 e_6 e_7$ describes an Euler circuit (Notice that some vertices are visited multiple times; it is the edges that must appear exactly once in an Euler path.) The second graph is not an Eulerian graph. Convince yourself of this fact by looking at all necessary trails or closed trails.

The following are useful characterizations of graphs with Euler circuits and Euler paths and are due to Leonhard Euler

Theorem on Euler Circuits and Euler Paths

A finite connected graph has an Euler circuit if and only if each vertex has even degree.
A finite connected graph has an Euler path if and only if it has at most two vertices with odd degree.

Euler solved a famous problem about the seven bridges of Königsberg by representing the problem as a graph (with parallel edges.)

14.14. Hamiltonian Graphs

A cycle in a graph $G$, is called a Hamiltonian cycle if every vertex, except for the starting and ending vertex, is visited exactly once.

A graph is Hamiltonian, or said to be a Hamiltonian graph, if it contains a Hamiltonian cycle.

The following graph is Hamiltonian and shows a Hamiltonian cycle $ABCDA$, highlighted (Notice that some edges are used multiple times; it is the vertices, starting and ending vertex, that must appear exactly once in an Hamiltonian path.) The second graph is not Hamiltonian.

Theorem (Dirac) on Hamiltonian graphs

A simple graph, with $n≥3$ vertices, is Hamiltonian if every vertex $v$ has degree $d(v)\geq \frac{n}{2}$.

14.15. Finding A Shortest Path in a Weighted Graph: Dijkstra’s Algorithm

In some applications of graph theory, you need to find a "shortest path" between two vertices of a weighted graph. In the context, shortest may mean "of least distance" but could mean "of least cost" or something else, depending on what the edge weights represent.

Edsger Dijkstra published a paper in 1959 that describes an algorithm for finding the path of "minimum total weight" between two given vertices of a simple connected graph with weighted undirected edges.
Dijkstra’s original paper is also available in the ACM Digital Library at this link.

Here is a description of the algorithm, based on Dijkstra’s original.

Task: Given two vertices $a$ and $z,$ find the edges of a path between the two vertices that has the minimum possible sum of weights.
Input: The list $V$ of all vertices of the graph, with the two vertices $a$ and $z$ specified, and the list $E$ of all weighted edges of the graph.
For example, the input could be an adjacency matrix for the graph, with the first row of the matrix corresponding to $a$ and the last row corresponding to $z$.
Steps:
1. Define four lists $V_{chosen},$ $V_{candidates},$ $E_{chosen},$ and $E_{candidates},$ and initialize each list as an empty list.
2. Append vertex $a$ to the end of $V_{chosen}.$
3. While vertex $z$ has not been appended to $V_{chosen}$
  1. Set $v$ to the last vertex appended to $V_{chosen}.$
  2. For each vertex $w$ that is not in $V_{chosen}$ but is adjacent to vertex $v$
    
    If $w$ is in $V_{candidates}$ and the edge $e$ that connects $v$ and $w$ is part of a path between $a$ and $w$ that has total weight less than the weight of the known path that uses the corresponding edge in list $E_{candidates},$ remove that edge from $E_{candidates}$ and append $e$ to $E_{candidates}.$
    
    Otherwise, $w$ is in neither list $V_{chosen}$ nor list $V_{candidates},$ so append vertex $w$ to the end of $V_{candidates}$ and append the edge $e$ that connects $v$ and $w$ to the end of $E_{candidates}.$
  3. After exiting the "for" loop,
    
    find the vertex $w$ in list $V_{candidates}$ that has the minimal-weight path to the starting vertex $a$ and append $w$ to the end of $V_{chosen},$ and remove $w$ from $V_{candidates},$ and
    
    append the edge in $E_{candidates}$ that has $w$ as one of its endpoints to the end of $E_{chosen}$ and remove that edge from $E_{candidates}.$
Output: The list $E_{chosen}$ of weighted edges.

Notice that the list $E_{chosen}$ is constructed so that it contains edges for only one possible path between $a$ and $z,$ and that path must be a minimal-weight path.

Also notice if the loop condition is changed to "while there is a vertex that is not in $V_{chosen}$" then the algorithm’s output $E_{chosen}$ will find the edges needed for a possible minimal-weight path between vertex $a$ and any other vertex in the graph.

Question: What change would be needed to the input if you had a graph with unweighted edges and needed to find a path between $a$ to $z$ that uses the smallest number of edges possible?

This Wikipedia page has some animations that illustrate an alternate implementation of Dijkstra’s algorithm.

14.16. The Traveling Salesperson Problem (TSP)

The Traveling Salesperson Problem (TSP) can be stated as “A traveling salesperson needs to start in the home city, visit each of a number of other cities, and then return to the home city. What path should the salesperson take so that the total distance traveled is the least possible?”

The TSP can be modeled using a graph. If there are $n$ cities, you can represent each city as a vertex of the complete graph $K_{n}$ and assign to each edge a weight equal to the distance between the cities at the endpoints. The TSP is solved by finding a Hamiltonian cycle of minimum total weight that visits each vertex exactly one.

Notice that if you choose a vertex (city) as the starting and ending point, then there are $\frac{1}{2}(n-1)!$ different Hamiltonian cycles (The division by 2 represents that you could reverse the direction of the cycle without changing the total distance traveled.)

The brute-force solution examines every possible path and has time complexity $\Theta (n!),$ which is infeasible for even relatively small values of $n.$ There is the Bellman–Held–Karp algorithm that solves the TSP with time complexity $\Theta (2^{n}n^{2}).$ Also, there are other methods to find “approximate” solutions to the TSP that are “good enough” for some problems. At present, there is no known algorithm with polynomial worst-case time complexity that solves the TSP.

14.17. Additional topics will be added to this chapter soon!

Transitive closure (Floyd’s algorithm)
Topological sort

MORE TO COME!

15. Trees

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on December 8, 2025.
Added more content to the section on binary trees: Tree traversal algorithms, Łukasiewicz’s parentheses-free notation (“Polish notation”) and expression trees.
Revised section on directed trees. Rewrote theorem on second characterization of trees.

A tree is a connected simple graph that contains no cycles. Trees are used to model decisions, to sort data, and to optimize networks.

Key terms and concepts covered in this chapter:

Trees
- Definition of “tree” and “forest”
- Properties of trees
- Spanning trees
- Binary trees
  - Traversal strategies
  - Expression trees

15.1. Definitions, Examples, and Properties of Trees and Forests

In this section, you will learn some of the basics about trees.

Much of the terminology related to trees is not standardized, that is, different textbooks and sources use different terminology for the same types of trees. The Remix uses terminology consistent with Handbook of Graph Theory, Second Edition by Gross, Yellin, and Zhang.

Recall that a simple graph has no loops and no parallel edges.

Also, recall that a cycle in a graph is a path that starts and ends at the same vertex.

A tree is a simple graph that is connected and has no cycles. Some sources use the term acyclic to mean "has no cycles."

A forest is the union of several trees. In other words, a forest is a simple graph that has one or more connected components, where each connected component is a tree.

The image shows a forest composed of three trees.

Notice that if you choose any pair of vertices in one of the trees in the image, there is only one path that joins that pair of vertices. In fact, this property is True for any tree.

Theorem

Suppose that $G$ is a simple graph. The following statements are logically equivalent.

$G$ is a tree.
$G$ is a simple graph such that for each pair of vertices of $G$ there is exactly one path between the vertices.

Proof

First, assume that $G$ is a tree; we will prove, using proof by contradiction, that for every pair of vertices of $G$ there is exactly one path between the vertices. By assumption, $G$ is connected and for every pair of vertices in $G$ there is at least one path joining those two vertices, so we only need to show that there cannot be two different paths that connect the same pair of vertices. To work toward a contradiction, let’s suppose that for some pair of vertices $u$ and $v$ there are two different paths between $u$ and $v;$ then we can start to go from $u$ to $v$ along a first path and then “turn around and go back” to $u$ along a second path. This means that there must be a cycle in $G$ that starts and ends at $u.$

Why must there be a cycle?

If we can go all the way from $u$ to $v$ along the first path and then go all the way back, in reverse order, from $v$ to $u$ along the second path without repeating any edges or vertices (except $u$) then we have found a cycle that starts and ends at $u.$
If the two paths have some common edges or vertices, we can use only part of each path to create a cycle that does not repeat any edges or vertices. To do this, write the first path as $v_{0}v_{1}...v_{n-1}v_{n}$ where $u = v_{0}$ and $v = v_{n}$ and let $j$ stand for the smallest positive index such that vertex $v_{j}$ appears in both the first and second paths. Now create a cycle by using the beginning of the first path to go from $u$ to $v_{j}$ and then using the beginning part of the second path that goes from $u$ to $v_j$, but in reverse order, to go back from $v_{j}$ to $u.$ Notice that the index $j$ is chosen so that that no edge or vertex (except $u$ and $v_j$) that we use can belong to both of these shorter paths from $u$ to $v_{j},$ so no edges are repeated when we use this path to go from $u$ back to $u$ - we have created a cycle.

To continue with the proof by contradiction, we’ve shown that there is a cycle in $G,$ but this contradicts the assumption that $G$ is a tree that has no cycles. This means that the assumption "there is a pair of vertices in $G$ that are connected by two different paths" must be False. We have proven that for any pair of vertices in $G$ there is exactly one path joining that pair of vertices.

Secondly, assume that $G$ is a simple graph and that for every pair of vertices of $G$ there is exactly one path between the vertices. We will prove, using proof by contradiction, that $G$ must be a tree. By assumption, $G$ is connected because for any pair of vertices there is a path between those vertices, so we only need to prove that $G$ has no cycles. To reach a contradiction, let’s suppose that $G$ does have a cycle of the form $v_{0}v_{1} \cdots v_{n-1}v_{n}$ where $v_{0} = v_{n}$ (that is, $v_{0}$ and $v_{n}$ are the same vertex, but no other vertex is repeated in the cycle.) Notice that, because $G$ is a simple graph, the integer $n$ must be greater than or equal to 3.

Why must n be greater than or equal to 3?

Notice that $n$ must be positive since there are at least two vertices in the path.
If $n=1$ then the cycle is $v_{0}v_{1}$ where $v_{0} = v_{1},$ but this cycle consists of a single loop, which contradicts that $G$ is a simple graph that has no loops.
If $n=2$ then the cycle is $v_{0}v_{1}v_{2}$ where $v_{0} = v_{2},$ but this cycle would use the same edge twice which contradicts the definition of a cycle.
This shows that $n \geq 3.$

This means that we have two different paths, $v_{0}v_{1}$ and $v_{1} \cdots v_{n},$ between $v_0$ and $v_1$ (because $v_{0}$ and $v_{n}$ are the same vertex.) Now we can reverse the order of vertices in the second path to get two different paths between the vertices $v_{0}$ and $v_{1}$ - but we assumed that for every pair of vertices of $G$ there is exactly one path between those vertices. We have derived a contradiction, which means that $G$ cannot have a cycle. Therefore, $G$ is connected and has no cycles, which is the definition of a tree, so $G$ is a tree.

Q.E.D.

Next, we will give another characterization of trees, after first proving a lemma.

Lemma

Suppose that $G$ is a connected simple graph with $n$ vertices and $n-1$ edges, where $n$ is some postive integer greater than or equal to $2.$ Then $G$ must have at least one vertex of degree $1.$

Proof

$G$ is assumed to be a connected graph, so that every vertex is the endpoint of at least one edge. That is, the degree of every vertex is greater than or equal to $1.$

We now use proof by contradiction to show that there is at least one vertex that has degree equal to $1.$ To work towards a contradiction, suppose that every vertex has degree greater than $1,$ that is, the degree of every vertex is greater than or equal to $2.$ This means that the sum of the degrees of all vertices is greater than or equal to $2n.$ By the Handshake Lemma, the sum of the degrees of the vertices must be equal to $2$ times the number of edges, which in this case is $2(n-1).$ We have obtained a contradiction, since the Handshake Lemma proves that the sum of the degrees of the vertices must be equal to $2(n-1)$, but the sum of the degrees of the vertices must also be greater than or equal to $2n.$ From this contradiction we can conclude that it is False that the degree of every vertex is greater than or equal to $2,$ so there must be at least one vertex with degree less that $2.$ Recalling that the degree of every vertex is greater than or equal to $1$ proves that at least one vertex has degree $1.$

Theorem

Suppose that $G$ is a connected simple graph with finitely many vertices. The following statements are logically equivalent.

$G$ is a tree.
If $G$ has $n$ vertices, where $n$ is a positive integer, then $G$ has $n-1$ edges.

Proof

First, assume $G$ is a tree with $n$ vertices. We will use mathematical induction on $n$ to prove that the number of edges must be $n-1.$

Let $P(n)$ be the predicate \[P(n): \text{If a tree has } n \text{ vertices then the tree has } n-1 \text{ edges.}\] We will prove that the proposition $(\forall n \in \mathbb{N}_{>0})P(n)$ is True.

Basis Step: $P(1)$ is True since a tree that has $1$ vertex has $0$ edges (otherwise, the edge would have to be a loop, but trees are simple graphs that don’t have loops.)
We can also prove that $P(2)$ is True in case the proof of $P(1)$ feels unsatisfying. If a tree has $2$ vertices, then since a tree is a connected simple graph, the $2$ vertices must be connected by a path, and since a tree cannot have parallel edges, the vertices are the endpoints of exactly $1$ edge. This proves that $P(2)$ is True.

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$.

Assume that $P(k)$ is True for the positive integer $k,$ that is, if a tree has $k$ vertices then the tree has $k-1$ edges. We can assume $k \geq 2$ since the cases when $k \in \{ 1, 2 \}$ were proved in the Basis Step. Suppose that we have a tree $T$ that has $k+1$ vertices; we will prove that the tree must have $k$ edges. First, there must be at least one vertex $v$ in $T$ such that the degree of $v$ is 1: If every vertex had at least degree $2,$ then we could find a cycle in $T,$ which cannot be True since $T$ is a tree. Remove one vertex $v$ that has degree $1$ along with the edge that has $v$ as an endpoint to obtain the subgraph $T-v.$ Notice that for every pair of vertices of $T-v$ there is exactly one path between the vertices, and applying the previous theorem shows that $T-v$ is a tree. Also, $T-v$ has $k$ vertices because we removed only $v,$ so we can apply the Induction Hypothesis to conclude that $T-v$ has $k-1$ edges. Now, reinsert vertex $v$ and the edge that was removed to obtain the tree $T$ that has $k+1$ vertices and $k$ edges. Therefore, if $P(k)$ is True then $P(k+1)$ is True, too. That is, $P(k) \rightarrow P(k+1)$.

Conclusion Step: We have proven both the Basis Step and the Induction Step. Therefore, we can conclude that for all positive integers $n,$ if a tree has $n$ vertices then the tree has $n-1$ edges.

Secondly, assume that $G$ is a connected simple graph that has a finite number of vertices such that the number of edges is one less than the number of vertices. We will use mathematical induction on the number of vertices to prove that $G$ is a tree.

Let $P(n)$ be the predicate \[P(n): \text{If } G \text{ is a connected simple graph that has } n \text{ vertices and } n-1 \text{ edges then } G \text{ is a tree.}\] We will prove that the proposition $(\forall n \in \mathbb{N}_{>0})P(n)$ is True.

Basis Step: $P(1)$ is True since a connected simple graph that has $1$ vertex and $0$ edges is a tree.
We can also prove that $P(2)$ is True in case the proof of $P(1)$ feels unsatisfying. If G is a connected simple graph that has $2$ vertices and $1$ edge, then the edge cannot be a loop so must have $2$ different endpoints. Since there are only $2$ vertices in $G$, the edge must have those vertices as its endpoints, so $G$ consists of a single edge with distinct endpoints. Since there is exactly one path from one of the vertices to the other, the previous theorem can be applied to show that $G$ is a tree. This proves that $P(2)$ is True.

Induction Step: First, we assume that the induction hypothesis $P(k)$ is True for some positive natural number $k$.

Assume that $P(k)$ is True for the positive integer $k,$ that is, if a connected simple graph has $k$ vertices and $k-1$ edges then the graph is a tree. We can assume $k \geq 2$ since the cases when $k \in \{ 1, 2 \}$ were proved in the Basis Step. Suppose that we have a connected simple graph $G$ that has $k+1$ vertices and $(k+1)-1$ edges; we will prove that $G$ is a tree. From the lemma proved earlier, we know that at least one of the vertices in $G$ must have degree exactly equal to $1;$ let $v$ be such a vertex. Now consider the graph $G-v$ obtained by removing $v$ and the one edge $e_v$ that has $v$ as an endpoint: It is a connected simple graph with $k$ vertices and $k-1$ edges, so the Induction Hypothesis lets us conclude that $G-v$ is a tree. In particular, there is exactly one path between any two vertices in $G-v.$ Now reattach $v$ and the edge $e_v$ that has $v$ as an endpoint. Notice that there is exactly one path between any two vertices in $G:$ Either the two vertices are in $G-v$ (so there is exactly one path) or one vertex, $w,$ is in $G-v$ and the other vertex is $v,$ but in this second case there is exactly one path which goes from $v$ to the other endpoint of the edge $e_v$ and then continues as a path in $G-v$ from that other endpoint to the vertex $w.$ Since there is only one path between $w$ and the other endpoint of $e_v$ in $G-v,$ this path between $w$ and $v$ is the only one possible path in $G.$ In summary, we have shown that there is exactly one path between any two vertices in $G,$ and the previous theorem applies to prove that $G$ is a tree.

Q.E.D.

15.2. Spanning Trees and Spanning Forests

Recall that a subgraph of a graph $G$ is a graph $H$ such that every vertex of $H$ is a vertex of $G$ and every edge of $H$ is a edge of $G$ (with both endpoints in the vertex set of $H.$)

A subtree of a graph $G$ is a subgraph of $G$ that is also a tree. Likewise, a subforest of $G$ is a subgraph of $G$ that is also a forest.

A spanning tree of a graph $G$ is a subgraph of $G$ that is a tree that includes all the vertices of $G.$ Likewise, a spanning forest of a graph $G$ is a subgraph of $G$ that is a forest that includes all the vertices of $G.$

The image shows the graph $K_{4}$ along with three spanning trees.

The image shows the graph $K_{4}$ along with a subgraph that is a subtree that is not a spanning tree, and also a subgraph that is a spanning forest.

Spanning trees are used to solve problems that involve simplifying or optimizing networks. You can learn more about some of the applications of spanning trees at this Wikipedia page.

The following subsection presents one such application.

15.2.1. Minimal-Weight Spanning Trees in Weighted Graphs

In some applications of graphs with weighted edges, you may need to find a spanning tree that has the minimal total weight possible, that is, a spanning tree with sum of edge weights less than or equal to the corresponding sum for any other spanning tree. Such a spanning tree is referred to as a minimal-weight spanning tree.
Note: Many textbooks and sources use the term minimum length spanning tree because the use of these spanning trees historically arose in problems that involved physical distances between nodes of a network. Other sources use the term minimum spanning tree, abbreviated as MST.

As an example, the image shows a weighted graph along with all three possible spanning trees. The minimal-weight spanning tree, with total weight 6, is drawn on the lower left.

Notice that for the graph in the image, it was both easy and efficient to use “brute force” to look at all of the spanning trees and compute all of the sums of weights for those spanning trees. For weighted graphs that have many more vertices and/or edges, you will need to use a more efficient problem-solving strategy. This textbook discusses one such strategy, Kruskal’s algorithm, in detail.

Kruskal’s Algorithm

Joseph Kruskal published a paper in 1956 that describes an algorithm for constructing “the shortest spanning subtree” of a connected simple graph (Kruskal assumed that the graph has only finitely-many edges, and that each edge has a positive weight which represents the distance between its endpoints.)

Task: Given a connected graph $G = (V,E)$ with weighted edges, construct a spanning tree of $G$ that has the minimal possible sum of weights.
Input: The list $E$ of all weighted edges of the graph.
Steps:
1. Sort the list of edges $E$ so that each edge $e_{k}$ in the list has a weight that is less than or equal to the next edge $e_{k+1}$ in the list.
2. Define the list $E_{chosen}$ and initialize it as an empty list.
3. Set integer index variable $i$ to 0.
4. While $i$ is less than $|E|,$ the number of edges
  1. If it is impossible to form a cycle using edge $e_{i}$ along with some (or all) of the edges in list $E_{chosen}$
    
    Append $e_{i}$ to $E_{chosen}$
  2. Increment $i$ by 1
Output: The list $E_{chosen}$ of weighted edges.

Notice that the output list $E_{chosen}$ was constructed so that its edges cannot be used to form a cycle in the graph $G.$ Also, since the graph $G$ is assumed to be connected, every vertex will be the endpoint of at least one edge in $E_{chosen},$ so that the graph with vertex set $V$ and edge set $E_{chosen}$ will be a spanning tree of $G.$

Also notice that the condition for the while loop can be changed to “while $|E_{chosen}| < |V| - 1$” since a spanning tree must have one fewer edges than vertices.

Example 1 - An example of using Kruskal’s algorithm

The image shows a drawing of a simple weighted graph along with all three possible spanning trees.

The minimal-weight spanning tree, with total weight 6, is drawn on the lower left.

Let’s trace through the steps of Kruskal’s algorithm to examine how this minimal-weight spanning tree is constructed. The following table represents the input to the algorithm: A list of all the edges of the graph, along with their corresponding weights.

Edge	Weight
{c, d}	1
{a, b}	2
{b, c}	3
{a, c}	5

Steps:
1. Sort the list of edges of the graph in order of increasing weight (This has already been done in the table shown.)
2. Set $E_{chosen}$ to the empty list.
3. Set the index $i$ to 0.
4. Enter the while loop.
  
  $i=0$
  
  $i$ is less than 4, and it is impossible to form a cycle using $\{c, d\}$ alone,
  so set $E_{chosen}$ to $[\{c, d\}$] and set $i$ to 1.
  
  $i=1$
  
  $i$ is less than 4, and it is impossible to form a cycle using $\{a, b\}$ along with the edge in $E_{chosen} = [\{c, d\}$]
  so set $E_{chosen}$ to $[\{c, d\} , \{a, b\}$] and set $i$ to 2.
  
  $i=2$
  
  $i$ is less than 4, and it is impossible to form a cycle using $\{b, c\}$ along with the edges in $E_{chosen} = [\{c, d\} , \{a, b\}$]
  so set $E_{chosen}$ to $[\{c, d\} , \{a, b\}, \{b, c\}$] and set $i$ to 3.
  
  $i=3$
  
  $i$ is less than 4, but it is possible to form a cycle using $\{a, c\}$ along with the edges in $E_{chosen} = [\{c, d\} , \{a, b\}, \{b, c\}$]
  so make no change to $E_{chosen}$ and set $i$ to 4.
  
  $i=4$
  
  Since $i$ is not less than 4, exit the while loop.

The output is $E_{chosen} = [\{c, d\} , \{a, b\}, \{b, c\}$] which is the list of edges used in the spanning tree drawn on the lower left of the image. That is, in at least this one case, the algorithm does construct a minimal-weight spanning tree.

Question

How could you validate Kruskal’s algorithm?
That is, how could you prove that Kruskal’s algorithm must construct a minimal-weight spanning tree for any input graph that is connected and has finitely-many weighted edges?

Hint

Try to create a predicate that you could prove True for all natural numbers by using mathematical induction.

Here is an exercise for you to try.

Check Your Understanding

Use Kruskal’s algorithm to find the minimal-weight spanning tree for the following graph.

The image shows a graph with vertices representing ten of the busiest airports, by total cargo throughput, in the United States of America. Each vertex is labeled with the International Air Transport Association code for one of the airports, and each edge represents the route between the two airports at the endpoints.
Image credit: Remixer-created derivative of original work "9-simplex graph.png". The original work has been released into the public domain by its author, Tomruen at English Wikipedia. This applies worldwide.

This table lists each edge with its corresponding weight.

You can download the table as a tab-delimited text file to import the data into a spreadsheet app to sort the list.

You also can view the graph as a map of the routes between the airports at Markus Englund’s Great Circle Map website.

Try to work this out on your own, then confirm that you correctly found the minimal-weight spanning tree by clicking to see the answer below.

Answer

The image shows the edges of the minimal-weight spanning tree in orange. K10-US-airports-spanning-tree

Image credit: Remixer-created derivative of original work "9-simplex graph.png". The original work has been released into the public domain by its author, Tomruen at English Wikipedia. This applies worldwide.

The table lists the edges of the minimal-weight spanning tree. + +
A map view of the minimal-weight spanning tree can be seen at the Great Circle Map website.

Other Algorithms for Minimal-Weight Spanning Trees

Among the other algorithms that can be used to find a minimal-weight spanning tree are Borůvka’s algorithm and Prim’s algorithm (also known as the DJP algorithm.) You can learn more about these and other algorithms and their history starting with this section of the Wikipedia page about MSTs, as well as the section Historical Notes from Applied Combinatorics by Keller and Trotter.

In 2002, Pettie and Ramachandran published a paper for an optimal minimum spanning tree algorithm.

15.3. Directed Trees

Recall that a tree is defined to be a connected simple graph that contains no cycles. The edges of a tree are undirected.

However, there are applications where it is useful to think of a tree as having directed edges. For that purpose, a directed tree is defined to be a directed graph whose underlying graph is a tree.

15.3.1. Rooted Trees

In some applications of directed trees, you can view the directed tree as if all paths “flow away” from a single vertex. For these applications, the concept of a rooted tree is useful.

The set of rooted trees can be defined recursively as follows.

Basis Step

A single vertex r is a rooted tree. The vertex r is called the root node of this rooted tree.

Recursion

Suppose that you have a nonempty finite set of already-constructed rooted trees (which will be called “old rooted trees” below) such that the following propositions are True:
(1) No vertex appears in more than one of the old rooted trees.
(2) No edge has its two endpoints in two different old rooted trees.
You can construct a new rooted tree by first creating a new root node r that is not a vertex of any of the old rooted trees, and then, for each of the old root nodes, creating a new directed edge from r to that old root node. That is, each new directed edge has initial vertex r and terminal vertex the root node of one of the old rooted trees.

The image shows the basis step and represents, in part, the results of the first and second uses of the recursion step. In the image, the new root node created at each step is drawn at the top of the rooted tree.

Notice that the directed edges of rooted trees are drawn without arrows! Computer scientists usually follow this convention: A rooted tree is drawn with the root node at the top, with all the old root nodes of the previously-constructed rooted trees attached in the Recursion step drawn at the same horizontal level beneath the root node. This convention, along with the recursive definition, ensures that the direction of each directed edge is “down,” so an arrow is not needed to determine the direction of edge.

Notice that if you want to construct just one particular rooted tree, you only need to use the Recursion step finitely-many times. However, at each use of the Recursion step, there is an infinite number of possible rooted trees that can be constructed. Also, to construct all possible rooted trees would require infinitely-many uses of the Recursion step.

Some of the terminology used with rooted trees is borrowed from “family trees,” and other terminology is borrowed from plant science.

The new root node $r$ added in the Recursion step is called the parent of each of the old root nodes, and each of the old root nodes is called a child of $r.$ The old root nodes collectively are called the children of $r.$
Two or more nodes with the same parent are called siblings.
A node that has one or more children is called an internal node.
A node that has no children is called a leaf. A leaf can also be called an external node.
The depth of node $v$ in a rooted tree is the length of the shortest path from the root node $r$ to the node $v.$ This is also called the level of the node $v.$
A level of a rooted tree is the set of all nodes at the same depth. For example, level 1 is the set of all the child nodes of the root node. Level 0 is the set containing only the root node.
The height of a rooted tree is the maximum of the depths of all the nodes in the rooted tree.

The next lemma shows that you can construct any rooted tree without using recursion by instead starting with a tree then replacing all the undirected edges by appropriately directed edges.

Lemma

If $T$ is a rooted tree, then its underlying graph $G$ is a tree that has finitely many vertices.
If $G$ is a tree that has finitely many vertices, then $G$ is the underlying graph of a rooted tree $T.$

Proof outline

First, suppose that $T$ is a rooted tree. Use strong induction on the number of recursion steps needed to construct the rooted tree to prove that the underlying graph $G$ must be a simple graph such that for each pair of vertices, there is exactly one path between the vertices. In more detail, for a rooted tree that was constructed using the recursion step only one time, there is only one path between two vertices of the underlying graph: All paths must pass through the root node. For the induction step, any path in the underlying graph of a rooted tree constructed using the recursion step $k+1$ times will either pass through the new root node or one of the subtree’s root nodes (and the induction hypothesis applies to each underlying graph of the subtrees since they were constructed using fewer than $k+1$ applications of the recursion step.) Now use the theorem from the first section of the chapter to conclude that $G$ must be a tree, that is, $G$ must be a connected simple graph with no cycles. That $G$ must have finitely many vertices follows from the recursive definition of “rooted tree” since at most finitely many vertices are introduced each time the recursion step is used.

Secondly, suppose that $G$ is a tree that has finitely many vertices. Use strong induction on the number of vertices to prove that a rooted tree $T$ can be constructed so that $G$ is the underlying graph of $T.$ In more detail, choose any vertex $r$ of $G,$ which will become the root node of our rooted tree, then remove $r$ and any edges incident to $r$ to create a forest of subtrees $G_{1}, \, \ldots \, G_{n}$ where $n$ is the number of trees in the forest. Since each of the subtrees is either a single vertex (which is the basis case for the recursive definition of rooted trees) or a tree that contains fewer vertices than $G,$ we can apply the induction hypothesis to each of the subtrees to show that each subtree $G_{i}$ is the underlying graph of a rooted tree $T_{i}.$ Now apply the recursion step in the definition of rooted trees to reconnect the root nodes of the rooted trees $T_{1}, \, \ldots \, T_{n}$ to $r,$ replacing the undirected edges that had been removed by directed edges from $r$ to each of the root nodes of rooted trees $T_{1}, \, \ldots \, T_{n}.$ This shows that $G$ is the underlying graph of a rooted tree. // MKD Aug 15 2025 //It is important to note that, if the direction of the edges matters, you will get a different rooted tree if you choose a different root node (that is, the implied directions of edges will be different even though the underlying undirected graphs will be isomorphic copies of the undirected tree $G.$) It is important to note that you will get a different rooted tree if you choose a different root node $r$ since the directed edges will “flow away” from the root node (That is, the directions of the edges will be different for different choices of root node, but the underlying undirected graph will be the same.)

In summary, the lemma tells you that every rooted tree can be thought of as an adaptation of a tree, where you first select a vertex to be the root node and then replace all the edges of the tree by directed edges that “flow away” from the root node.

15.3.2. Ordered Trees

Notice that in the Recursion step of the definition of rooted tree, the new root node is connected to each of the old root nodes, but since the subtrees' roots are members of a set, the order in which the new root node is connected to the old root nodes is not important. However, in some applications of rooted trees, it is important to note the order in which the new root node is connected to the old root nodes.

For example, in a family tree, you may want to represent a person by the root node of the tree, then represent their offspring by birth order as the child nodes. In this case, the order of the children matters. To do this, you can list the old subtrees as the sequence $T_{1}, \, \ldots \, T_{n}$ and list the old root nodes as the sequence $r_{1}, \, \ldots \, r_{n},$ where $n$ is the number of old subtrees used in the Recursion step.

In this subsection, a recursive definition for ordered trees is presented. This recursive definition takes the order of the indices of the old ordered trees into account when constructing the new ordered tree.
Warning: Many other names for ordered trees appear in various sources, such as ordered rooted tree, rooted plane tree, RP-tree, and decision tree. In fact, some sources even define “rooted tree” to mean what this textbook calls and “ordered tree.”

The set of ordered trees is defined recursively as follows.

Basis Step

A single vertex r is an ordered tree. Vertex r is the root node of this ordered tree.

Recursion

Suppose that you have already constructed the ordered trees in the sequence $T_{1}, \, \ldots \, T_{n}$ where $n$ is a positive integer and for each positive integer $i \leq n,$ the root node of $T_{i}$ is the vertex $r_{i}.$
If both of the propositions
(1) No vertex is in more than one of $T_{1}, \, \ldots \, T_{n}.$
(2) No edge has endpoints in two of $T_{1}, \, \ldots \, T_{n}.$
are True, then you can construct a new ordered tree by first creating a new root node r that is not a vertex of any of the ordered trees in the sequence $T_{1}, \, \ldots \, T_{n}$ and then creating $n$ new directed edges, with one directed edge from $r$ to each of the old root nodes in the sequence $r_{1}, \, \ldots \, r_{n},$ in that order. The ordering of the children of the new root node $r$ corresponds to the increasing order of the subscripts of the old root nodes (which also allows you to use the same ordering for the old ordered trees.)

Ordered trees are usually drawn so that the root node appears at the top of the tree, and for each internal node, the children are drawn in order from left to right.

In summary, an ordered tree can be thought of as a special kind of rooted tree where the children of each internal node are ordered.

15.3.3. Isomorphisms: Rooted Trees and Ordered Trees

As discussed in the Graphs chapter, the definition of isomorphism can be adapted to include one-to-one correspondences between the vertex sets of two graphs that preserve specific features of a graph in addition to the adjacency relationships between vertices. Examples of such features are edge weights, edge directions, vertex colors, or edge colors.

In this subsection, examples are presented to show how the definition of isomorphism can be adapted for rooted trees and ordered trees.

Nonisomorphic Rooted Trees with Isomorphic Underlying Graphs

Here is a question for you: In the image, do the two drawings represent graphs or rooted trees? It’s okay if you are not sure how to answer this question. Since rooted trees are usually drawn with edges that do not use arrows to indicate the direction, the drawings are ambiguous, and you would probably need more context to decide whether the drawings represent two undirected graphs or two rooted trees.

Notice that the interpretation also effects whether the two drawings represent isomorphic objects.

Suppose that the drawings are interpreted as undirected graphs. These are isomorphic as undirected graphs, and also isomorphic as trees, since redrawing the graph on the left as the one on the right does not change the adjacency relationships of the vertices.
Suppose that the drawings are interpreted as rooted trees. The adjacency relationships are the same, but the direction of the edge with endpoints $A$ and $B$ depends on whether $A$ or $B$ is chosen as the root node. The underlying graphs with undirected edges are isomorphic, but the rooted trees are not isomorphic.

Nonisomorphic Ordered Trees that are Isomorphic Rooted Trees

In the image, two rooted trees are shown. The two rooted trees are isomorphic as rooted trees since the order of the children of the root node does not matter. However, these two rooted trees are not isomorphic as ordered trees since the corresponding children are not in the same order in each rooted tree.

It may be helpful to think of each of the two ordered trees as “telling a story” about a family.

On the left, a parent has three children, and the oldest child has three children, the middle child has no children, and the youngest child has one child.
On the right, a parent has three children, and the oldest child has no children, the middle child has three children, and the youngest child has one child.

In general, ordered trees are isomorphic if and only if they tell the same story about the families, so these two are not isomorphic as ordered trees. On the other hand, they are isomorphic as rooted trees because they tell the same story when the birth order of the children at level 1 is ignored: A parent has three children, of whom one has three children, another one has one child, and yet another one has no children.

15.4. Binary Trees

As mentioned near the beginning of this chapter, much of the terminology related to trees is not standardized. The definitions in this section use terminology consistent with Handbook of Graph Theory, Second Edition, but information on alternative definitions is also stated.

For any positive integer $m,$ an m-ary tree is defined to be an ordered tree in which each internal node has at most $m$ children.
Some sources define m-ary tree to be a rooted tree instead of an ordered tree.

A binary tree is a 2-ary tree, or more simply, an ordered tree in which each internal node has at most two children. In a binary tree, the root node is the parent of at most two subtrees $T_{1},$ called the left subtree, and $T_{2},$ called the right subtree. Notice that in the context of binary trees, it is allowable for either or both of the subtrees to be absent (that is, empty).
Some sources allow a binary tree to be “empty,” in the sense that it has no vertices or edges. An “empty binary tree” is not an ordered tree as defined above since it has no root node. However, an empty binary tree is useful in computer science applications. For example, when implementing binary trees as data structures, an empty binary subtree corresponds to by a null pointer (or null reference); a leaf node of a binary tree can be recognized as a node whose left child and right child pointers/references are both null.

A complete binary tree is a binary tree in which every internal node has exactly 2 children and all leaves are at the same level.
Some sources use either “perfect binary tree” or “full binary trees” to describe this type of binary tree and use the phrase “complete binary tree” to describe something else.

A binary tree is called balanced if for every vertex $v,$ the number of vertices in the left subtree of $v$ and the number of vertices in the right subtree of $v$ differ by at most one.
Some sources define a binary tree of height $h$ to be balanced if each leaf is at either level $h$ or level $h − 1.$

15.4.1. Tree Traversal Algorithms for Binary Trees

In many applications of binary trees, it is necessary to visit every node. The process of visiting every node is called tree traversal. In this subsection, you will learn about three commonly-used algorithms for tree traversal.

The three traversal algorithms can be described recursively as follows.

Preorder traversal: First, visit the root node. Secondly, visit each node in the root node’s left subtree using preorder traversal. Finally, visit each node in the root node’s right subtree using preorder traversal.
Inorder traversal: First, visit each node in the root node’s left subtree using inorder traversal. Secondly, visit the root node. Finally, visit each node in the root node’s right subtree using preorder traversal.
Postorder traversal: First, visit each node in the root node’s left subtree using postorder traversal. Secondly, visit each node in the root node’s right subtree using postorder traversal. Finally, visit the root node.

For each of these three traversal algorithms, the Base Step for the recursion is applied to a binary tree that has only one node (a root node that has empty left and right subtrees), and you visit just that one node.
Some sources use hyphens in the written names of these traversal algorithms, so list them as pre-order, in-order, and post-order traversal.

The image displays the same binary tree with its nodes being “counted off” using each of these three traversal algorithms.

You may find the interactive demonstration at this Wikipedia page useful to understanding these three traversal algorithms.

15.4.2. Binary Search Trees

Define a key to be an element of a set that has a total ordering. For example, keys could be integers ordered using the usual $\leq$ relation. As another example, keys could be strings ordered alphabetically. In general, keys could be any type of data that can be totally ordered.

A binary search tree (BST) is a binary tree where each vertex is assigned a key in such a way that that if key $k$ is assigned to vertex $v$ then both of the following are True.

the key $k$ is greater than each key assigned to a vertex in the left subtree of $v,$ and
the key $k$ is less than each key assigned to a vertex in the right subtree of $v.$

If you are given a list (or an array) of keys, a binary search tree can be used to sort the keys and then quickly search for a key. An advantage of binary search trees is that it is much easier and faster to maintain the sort order if you need to insert new keys than it would be to insert the same keys into a sorted list (or an array) of keys.

Example 2 - Constructing a Binary Search Tree

First, let’s construct a binary search tree from the list of keys $ \left[ 17, 3, 5, 31, 2 \right .$]

The first key in the list, $17,$ is assigned to the root node.
The next key, $3,$ is less than the key $17$ at the root node, so $3$ is assigned to the left child of the root node.
The next key, $5,$ is less than the key $17$ at the root node, so $5$ must be assigned to some node in the left subtree.
- Since $5$ is greater than the key $3$ at the left subtree’s root, $5$ is assigned to the right child of the left subtree’s root.
The next key, $31,$ is greater than the key $17$ at the root node, so $31$ is assigned to the right child of the root node.
The last key, $2,$ is less than the key $17$ at the root node, so $2$ must be assigned to some node in the left subtree.
- Since $2$ is greater than $3,$ $2$ is assigned to the left child of of the left subtree’s root.

The following animation illustrates the construction of this binary search tree.

Now, let’s search for some keys. If a key is not already present, it will be inserted in the correct position.

Consider the key $5.$

$5$ is less than the key at the root node, which is $17.$ If $5$ is in this BST, it must be in the left subtree, so continue searching from there.
$5$ is greater than the key at the left subtree’s root node, which is $3.$ If $5$ is in this BST, it must be in the right subtree of this node, so continue searching from there.
$5$ is equal to the key at the subtree’s root node, so we have found this key!

Now consider the key $19$ which was not included in the original list.

$19$ is greater than the key at the root node, which is $17.$ If $19$ is in this BST, it must be in the right subtree, so continue searching from there.
$19$ is less than the key at the right subtree’s root node, which is $31.$ If $19$ is in this BST, it must be in the left subtree of this node, so continue searching from there.
This subtree is empty, so insert a new root node for this subtree and assign the key $19$ to this new node.

The following image shows the modified BST with the new key $19$ inserted in the correct position.

You Try

Locate the correct position of the new node when the key $11$ is inserted in the previously modified BST.

Answer

The key $11$ is assigned to the right child of the node that has the key $5.$ Click here to see an image of the BST with $11$ assigned to the new node.

Constructing a binary search trees from a list can serve two purposes. First, as shown in the preceding example, it is quicker to search for a key in the binary search tree because, in the best case scenario when the tree is balanced, the number of keys is halved with each iteration of the search. Secondly, the original list can be sorted by performing an inorder traversal of the binary search tree.

Exercise - Traversing a Binary Search Tree

Consider the binary search tree with integer keys that was the solution in the “You Try” section of the previous example.

List the keys of the binary search tree using each of preorder, inorder, and postorder traversal to visit all nodes of the binary search tree.

Answer

Preorder: $ \left[ 17, 3, 2, 5, 11, 31, 19 \right .$]
Inorder: $ \left[ 2, 3, 5, 11, 17, 19, 31 \right .$]
Preorder: $ \left[ 2, 11, 5, 3, 19, 31, 17 \right .$]

Here is another exercise for you to try.

Check Your Understanding

Consider the following list of names, which will be used as alphabetically-ordered keys. \[ \left[ \text{Jun, Li, Chris, Elias, Sofia, Adil, Maya} \right] \]

First, construct and draw the binary search tree for this list of keys.

Answer

Click here to see an image of the BST.

Next, write down the three lists you get by using preorder, inorder, and postorder traversals to visit all nodes of the binary search tree.

Answer

Preorder: [ Jun, Chris, Adil, Elias, Li, Sofia, Maya ]
Inorder: [ Adil, Chris, Elias, Jun, Li, Maya, Sofia ]
Preorder: [ Adil, Elias, Chris, Maya, Sofia, Li, Jun ]

15.4.3. Expression Trees and Polish Notation

In the 1920’s, the Polish logician Jan Łukasiewicz created a notational format for writing logical well-formed formulas that does not use any parentheses: Instead of writing logical operator symbols (like $\land$ and $\lor$) between two previously-constructed wffs, Łukasiewicz placed operator symbols in front of those wffs, which makes the use of parentheses completely unnecessary! Today, notation that is based on Łukasiewicz’s original notation is called either Polish notation or prefix notation and is applied to both arithmetic/algebraic formulas and logical formulas.

There is also the Reverse Polish notation, also called postfix notation, where the operator symbol is placed after its operands. The “usual” notation where the operator symbol is placed between the two operands is called infix notation.

Example 3 - Polish and Reverse Polish notation

Consider the following algebraic expression, written in infix notation. \[ 3 \cdot x - 5 \]

The Polish notation (prefix notation) for the same expression is \[ - \, \cdot \, 3 \, x \, 5 \] and the Reverse Polish notation (postfix notation) for the expression is \[ 3 \, x \, \cdot \, 5 \, - \]

You can use binary trees and tree traversal to make it easier to switch between these three notations. Define an expression tree to be a binary tree where each leaf is labeled by either a numeral or a variable symbol, and where each internal node is labeled by an operator symbol. The expression tree for the infix notation $3 \cdot x - 5$ is shown. Notice that in the expression tree for $3 \cdot x - 5,$ the root node corresponds to the “last” operation you would perform and the subtrees correspond to the previously-constructed expressions $3 \cdot x$ and $5.$ Also notice that, in the earlier example, the order of the symbols for the infix expression corresponds to the inorder traversal of the expression tree. It is an exercise for you to confirm that the order of the symbols in the Polish notation expression corresponds to the preorder traversal of the expression tree, and that the Reverse Polish notation expression corresponds to the postorder traversal of the expression tree.

Example 4 - Building And Traversing An Expression Tree

Consider the arithmetic expression. \[ 5 + 7 \times - 3 - 8 \div 2 \] Recall that it is not correct to evaluate this left-to-right; instead you need to use the order of operations, which is equivalent to evaluating the following expression that results after inserting parentheses. \[ (5 + (7 \times (- 3))) - (8 \div 2) \] From this second expression, you can see that the expression tree will have the minus sign at the root and subtrees that represent $(5 + (7 \times (- 3)))$ and $(8 \div 2).$ Continuing recursively, the subtree for $(5 + (7 \times (- 3)))$ will have the plus sign at its root and subtrees that represent $5$ and $(7 \times (- 3)),$ and the subtree for $(8 \div 2)$ will have the division sign at its root node and subtrees that represent $8$ and $2.$ Continue recursively until all symbols have been placed at a node.

The image displays a drawing of the expression tree for $(5 + (7 \times (- 3))) - (8 \div 2)$. Notice that the negative sign labels a node with only one child, which is treated as the left child (but as the right child for inorder traversal.)

Next, you can use each of the traversal methods to create the Polish (prefix), infix, and Reverse Polish (postfix) expressions.

$\text{prefix: } \ - \, + \, 5 \, \times \, 7 \, - \, 3 \, \div \, 8 \; 2 $

$\text{infix: } \ 5 \, + \, 7 \, \times \, - \, 3 \, - \, 8 \, \div \, 2 $

$\text{postfix:} \ 5 \, 7 \, 3 \, - \, \times \, + \, 8 \; 2 \, \div \, - $

You can see another example of how to construct expression trees, using Reverse Polish notation and a stack, at this Wikipedia page.

15.4.4. Isomorphisms: Binary Trees

In the image, two binary trees are shown. As binary trees are ordered trees, it should be clear that these cannot be isomorphic as binary trees because the order of the left and right children of the root node in the two trees are different.

15.5. Additional Topics To Be Added Later

Algorithms for Depth- and breadth-first traversals

16. Appendix: On-Demand Math Resources

This chapter was last updated on August 25, 2024.

This appendix discusses material that you have likely seen before but may need some review.

16.1. Linear Functions And Their Equations

A linear function is one that has a constant rate of change.

\begin{array}{|l|c|c|c|c|c|} & x & 1 & 2 & 3 & 4 & 5 \\ \hline \\ & y & 1 & 3 & 5 & 7 & 9 \end{array}

The table above displays a function with independent variable $x$ and dependent variable $y$.

Notice that the value of $y$ increases by $2$ for each increase in $x$ by $1$. The rate of change of this function is $2$; this corresponds to the slope $m$ of the continuous line that passes through the points with $xy$-coordinates given in the table.

The vertical intercept $b$ (in this case, the y-_intercept) is the $y$-value that corresponds to $x = 0$, that is, $(0,\,b)$ is on the same continuous line as the points represented in the table. In this example, $0$ is not a value of $x$, but we can still find the vertical intercept by _subtracting $1$ from the smallest $x$-value and subtracting $2$ from the corresponding $y$-value , which tells us that the point $(0,\,-1)$ lies on the same continuous line as the points represented in the table.

The equation of the linear function determined by the points in the table is $y = 2 \cdot x + (-1)$, which can be written more simply as $y = 2x - 1$. This also is the equation of the continuous line that passes through the points with $xy$-coordinates given in the table, but the linear function can be restricted to a smaller domain as needed by the context where it is being used, for example, we may only need to use inputs $x$ from the set of positive integers or possibly just the set $\{ 1,\,2,\,3,\,4,\,5 \}$.

16.2. Arithmetic Sequences

An arithmetic sequence or arithmetic progression is a sequence of numbers $a_{0}, \, a_{1}, \, a_{2}, \, \ldots$ such that there is a constant $d$ so that \[ a_{i+1}-a_{i} = d \text{ for all } i \in \mathbb{N}\] The constant $d$ is called the common difference of the sequence. The sequence can be infinite, defined for every $i \in \mathbb{N},$ or finite, defined only for $i \in \mathbb{N}$ less than some greatest index $n$.

As an example, the sequence $1, 4, 7, 10, 13, 16$ is a finite arithmetic sequence with common difference 3 and index set $\{ 0, 1, 2, 3, 4, 5 \}.$ We can extend that sequence to an infinite arithmetic sequence $1, 4, 7, 10, 13, 16, \ldots$ using a recursive definition \[a_{0} = 1, \text{ and } a_{i+1} = a_{i} + 3 \text{ for integer } \in \mathbb{N} \]

Notice that there is also a nonrecursive definition for this sequence: Since the difference between two consecutive terms of the sequence is always $d$ the points $(i, \, a_{i})$ must lie on a line in the xy-plane. The slope of this line is $d$ and the y-intercept of the line is the initial value $a_{0},$ so the arithmetic sequence can also be described as \[ a_{i}= d \cdot i + a_{0} \text{ for all } i \in \mathbb{N}\]

For the example $1, 4, 7, 10, 13, 16, \ldots$, the nonrecursive definition is \[a_{i} = 3i + 1 \text{ for all } i \in \mathbb{N}\]

16.3. Geometric Sequences

A geometric sequence or geometric progression is a sequence of numbers $a_{0}, \, a_{1}, \, a_{2}, \, \ldots$ such that there is a constant $r$ so that \[ a_{i+1} = r \cdot a_{i} \text{ for all } i \in \mathbb{N}\] The constant $r$ is called the common ratio of the sequence. The sequence can be infinite, defined for every $i \in \mathbb{N},$ or finite, defined only for $i \in \mathbb{N}$ less than some greatest index $n$.

As an example, the sequence $5, 10, 20, 40, 80$ is a finite geometric sequence with common ratio 2 and index set $\{ 0, 1, 2, 3, 4 \}.$ We can extend that sequence to an infinite geometric sequence $5, 10, 20, 40, 80, \ldots$ using a recursive definition \[a_{0} = 5, \text{ and } a_{i+1} = a_{i} \cdot 2 \text{ for integer } i \in \mathbb{N} \]

There is also a nonrecursive definition for this sequence: Since the ratio between a term and its predecessor in the sequence is always $r$ the points $(i, \, a_{i})$ must lie on the graph of an exponential function in the xy-plane. The y-intercept of the graph is the initial value $a_{0},$ so the geometric sequence can be described as \[ a_{i} = a_{0} \cdot r^{i} \text{ for all } i \in \mathbb{N}\]

For the example $5, 10, 20, 40, 80, \ldots$, the nonrecursive definition is \[a_{i} = 5 \cdot 2^{i} \text{ for all } i \in \mathbb{N}\]

MORE TO COME!

17. Appendix: Library of Functions

This chapter was last updated on August 24, 2025.
repaired some typos

Recall that for any function,

the domain is the given set of input values for the function,
the codomain is a given set that contains all possible output values (but may contain other values that are not outputs, too), and
the range is the set that contains only the output values of the function.

Functions can in many cases be visualized graphically, for example when mapping from the real line, $\mathbb{R}$ to the real line, such maps are viewed on a Cartesian plane.

17.1. Polynomial Functions

A polynomial is an algebraic expression of the form $a_{n}x^{n} + a_{n-1}x^{n-1} + \ldots + a_{1}x^{1} + a_{0}$, that is, $\sum\limits_{i=0}^{n}a_{i}x^{i}$, where n is a natural number, x is a variable, and $a_{n}, a_{n-1}, \ldots, a_{1}, a_{0}$ are real numbers. Examples of such expressions are

7, a constant,
$2x + 7$, a linear polynomial,
$3x^{2} + 2x + 7$, a quadratic polynomial.

A polynomial can be evaluated by substituting a number for each occurence of x. For example, if we substitute $-1$ for x in each of the three polynomials above, we get

7 evaluated at $x = -1$ is 7,
$2x + 7$ evaluated at $x = -1$ is $2 \cdot (-1) + 7 = 5$,
$3x^{2} + 2x + 7$ evaluated at $x = -1$ is $3 \cdot (-1)^{2} + 2 \cdot (-1) + 7 = 8.$

In this way, every polynomial can be used to define a corresponding polynomial function with domain $\mathbb{R}$ and codomain $\mathbb{R}.$

17.1.1. Quadratic Function

The function $f(x) =x^2$, denotes the association $(a,b) =(x, x^2)$ with $f : \mathbb{R} \rightarrow \mathbb{R}$. We notice that the range is the set of real numbers $[0, \infty)= \mathbb{R}^{+}$. The function is not invertible, since it is not injective. For example, we have both $f(-3) =9$ and $f(3)=9$. With $f : \mathbb{Z} \rightarrow \mathbb{Z}$ notice that the range is now $\mathbb{N}$

\begin{array}{lccccccccccc} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & x^2 & 25 & 16 & 9 & 4 & 1 & 0 & 1 & 4 & 9 & 16 & 25 \end{array}

The graph of $x^2$

17.1.2. The Cubic function

The function $f(x) =x^3$, denotes the association $(a,b) =(x, x^3)$ with $f : \mathbb{R} \rightarrow \mathbb{R}$. Also, we notice that the range is the set of all real numbers $(- \infty , \infty)=\mathbb{R}$. The function is bijective and so invertible. With $f : \mathbb{Z} \rightarrow \mathbb{Z}$, notice that the range, in addition to domain, is also $\mathbb{Z}$

\begin{array}{llcccccccccl} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & x^3 & -125 & -64 & -27 & -8 & -1 & 0 & 1 & 8 & 27 & 64 & 125 \end{array}

The graph of $x^3$

17.1.3. The Square Root and Cube Root Functions

For the purposes of completeness and for comparing how fast functions $f(x)$ grow for large x, we present the inverse of the functions $f(x)= x^2$ and $f(x)= x^3$, when $f(x):\mathbb{R}+→\mathbb{R}+$. Respectively, the functions$ f(x)=\sqrt{x}$ and $f(x)= $ \$root(3)(x)\$.

\begin{array}{lcccccccccclll} & x & 0 & 1 & 4 & 9 & 16 & 25 & 36 & 49 & 64 & 81 & 100 & 121 & 144 \\ & \sqrt{x} & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \end{array}

The graph of $√x$

\begin{array}{lcccccl} & x & 0 & 1 & 8 & 27 & 64 & 125 \\ & \sqrt[3]{x} & 0 & 1 & 2 & 3 & 4 & 5 \end{array}

The graph of \$root(3)(x)\$

17.2. Exponential and Logarithmic Functions

We begin by summarizing important properties of exponentials.

Properties of Exponentials

For $a>0, a ≠ 1$, $a^m.\ a^n=a^{m+n}$. For example, $3^4\cdot 3^5=3^{4+5}=3^9$.
$\frac{a^m}{a^n}=a^{m-n}$. For example, $\frac{3^5}{3^2}=3^{5-2}=3^3 $.
$\left(a^m\right)^n=a^{m.n\ }$. For example, $\left(3^4\right)^3=3^{4\cdot 3}=3^{12}$.
$\left(a.b\right)^m=a^mb^m$. For example, $\left(3x\right)^4=3^4.x^4$
$a^0=1$
$a^{-1}=\frac{1}{a}$ For example, $3^{-1}=\frac{1}{3}$.
$ a^\frac{1}{n} = \sqrt[n{a}$].

17.2.1. Exponential Functions

Exponential functions are of the form $f\left(x\right)=b^x$, where $b$ is the base and the variable $x$ is in the exponent. The base $b>0$ and $b ≠ 1$. Properties of exponential functions come from properties of exponents. When the base $b$ is greater than 1 the exponential function is increasing exponentially, as in the case $f(x) = 2^x$.

\begin{array}{llcccccccccl} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & 2^x & \frac{1}{32} & \frac{1}{16} & \frac{1}{8} & \frac{1}{4} & \frac{1}{2} & 1 & 2 & 4 & 8 & 16 & 32 \end{array}

The graph of $2^x$

When the base $b$ is less than 1 the exponential function is decreasing exponentially, as in the case $f(x) = \left(\frac{1}{3}\right) ^x$.

\begin{array}{llcccccccccl} & x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \\ & (\frac{1}{3})^x & 243 & 81 & 27 & 9 & 3 & 1 & \frac{1}{3} & \frac{1}{9} & \frac{1}{27} & \frac{1}{81} & \frac{1}{243} \end{array}

The graph of $\left(\frac{1}{3}\right)^x$

17.2.2. Logarithmic Functions

Logarithmic functions are the inverse functions corresponding to exponential functions and are used to solve exponential equations. For example, $y=2^x$ is solved for $x$ by inverting $x=\log_2{y}$. Properties of logarithms follow from this relationship between exponentials and logarithms and properties of the exponentials.

We summarize three important properties of logarithms.

Properties of Logarithms

The exponential function $f\left(x\right)=y=b^x$, written in exponential form is $\log_b{f\left(x\right)=\log_b{y=x}}$. Its inverse is the logarithmic function $x=b^y$, which is denoted $y=\log_b{x}$.
The power rule for logarithms states that $\log_b m^x=x\cdot \log_b m$.
Comparing the solutions of $2^x$, $x=\log_2{5}\text{,}$ and $x=\frac{\log_{10}{5}}{\log_{10}{2}}$, gives $\log_2{5}=\frac{\log_{10}{5}}{\log_{10}{2}}$, which, essentially, is the change of base formula $\log_b{A}=\frac{\log_a{A}}{\log_a{b}}$.

All other properties of logarithmic functions come from properties relating the logarithm as the inverse of the exponential and the equivalence of the logarithm $a =\log_b m$ with $b^a=m$.

When the base $b$ is greater than 1, the logarithm function is increasing, as in the case $f(x) = \log_2 x$.

\begin{array}{llllllcccccc} & x & \frac{1}{32} & \frac{1}{16} & \frac{1}{8} & \frac{1}{4} & \frac{1}{2} & 1 & 2 & 4 & 8 & 16 & 32 \\ & log_2 x & -5 & -4 & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 \end{array}

The graph of $\log_2 x$

When the base $b$ is less than 1, the logarithm function is decreasing exponentially, as in the case $f(x) = \log_{\frac{1}{3}} \ x$.

\begin{array}{llllllcccccl} & x & \frac{1}{243} & \frac{1}{81} & \frac{1}{27} & \frac{1}{9} & \frac{1}{3} & 1 & 3 & 9 & 27 & 81 & 243 \\ & \log_{\frac{1}{3}} x & 5 & 4 & 3 & 2 & 1 & 0 & -1 & -2 & -3 & -4 & -5 \end{array}

The graph of $\log_{\frac{1}{3}} \ x$

17.3. The Floor and Ceiling Functions

The floor and ceiling functions round a real number input to an integer.

The floor of x, written as $\lfloor x \rfloor,$ is the greatest integer that is less than or equal to x. In older textbooks you may see this function named as the greatest integer function and denoted by $[ x ] .$ For example, $\lfloor -1.5 \rfloor = -2$.
The ceiling of x, written as $\lceil x \rceil,$ is the least integer that is greater than or equal to x. For example, $\lceil -1.5 \rceil = -1$.

On a number line, the floor of x and the ceiling of x are the consecutive integers such that $\lfloor x \rfloor \leq x \leq \lceil x \rceil$.

The floor and ceiling functions are step functions: In the plane, their plots look like they are made of horizontal steps.

Note that the plot of the floor function, shown in green, is always at the same height or below the graph of the line $y = x$, and that the plot of the ceiling function, shown in red, is always at the same height or above the graph of the line $y = x.$

17.4. Other Functions

MORE TO COME!!

18. Appendix: An Introduction to Python

18.1. Programming Basics

Computers are programmable machines that process information by manipulating data. As the data can represent any real world information, and programs can be readily changed, computers are capable of solving many kinds of problems.

18.1.1. Programming Languages and Environments

There are many different programming languages for programmers to choose from. Each language has its own advantages and disadvantages, and new languages gain popularity while older ones slowly lose ground. In this book, we use the Python 3 programming language. It is popular in both academia and industry, and was designed with education in mind.

18.1.2. PythonTutor

PythonTutor is an environment for creating very short and simple Python programs and visualizing their execution. This enables beginners to visually see the data as it gets manipulated by the instructions.

Example 1 - A Simple Program

Use the Next button to step through the program below and watch the data get created and modified. Notice how the arrows move to indicate what instruction the program execution is on.

Edit in PythonTutor

18.1.3. Comments

Program files can contain source code and comments. Comments are not instructions for the computer to follow, but instead notes for programmers to read. Comments in Python start with a pound sign (#). Anything following the pound sign that is on the same line as the pound sign will not be executed. Often, at the very beginning of a program, comments are used to indicate the author and other information. Comments are also used to explain tricky sections of code or disable code from executing.

# This line is not Python code, it is a comment.

score = 9001 # over 9000!!!

# The next line of code is disabled because is starts with a #.
# score = 8000

18.2. Data Types

Programming is all about information processing. Information is categorized by data types. Four basic data types we will be considering are int, float, bool, and str. Int consists of integers, which are whole numbers written without a decimal point. This includes positive and negative whole numbers as well as zero. Float consists of floating-point numbers, which are numbers that are written with a decimal point. Bool consists of Boolean values (named after the mathematician George Boole). The only Boolean values are True and False. Str consists of strings, which are sequences of text characters including punctuation, symbols, and whitespace. Every value in Python has a corresponding data type. The table below shows examples of ints, floats, and strings.

Table 2. Basic Data Types
Data Type	Example Values
int	2, -2, 0, 834529
float	3.14, -2.3333, 7.0
bool	True, False
str	"Hello World!", 'Coconut', "0", '4 + 6'

Strings and Quotation Marks

Strings are always surrounded by quotation marks. Python allows either single (') or double (") quotation marks for single line strings. Python also allows triple quotation marks (either ''' or """) for a string that spans multiple lines.

18.3. Variables

Variables are (virtual) boxes that store values for reuse later. A variable has a name and a current value. Each variable can only hold one value at a time. Variables are assigned a value using the single equal sign (=). As Python executes one line at a time, variables come into existence on the line where they are first assigned. Each variable only stores the most recent value assigned to it.

Example 2 - Basic Variables and Data Types

Use the Next button to step through the program below and watch the variables get created.

Edit in PythonTutor

Variable Names

Variables can have complex names like player1_score. In general, never start a variable name with a number and never use spaces in variable names.

18.4. Operators and Expressions

Example 3 - Numerical Operators and Expressions

Try to predict the variable names, values, and data types in the the code below.

Edit in PythonTutor

Expression Evaluation

When Python encounters a line with an expression, it always evaluates the expression first. Consider the following line of code:

x = (3 + 4) * 2

Python first calculates the value of the expression to the right of the equal sign by using the standard order of operations starting inside the parentheses. The value given by the above expression is calculated to be equal to 14. Then, Python creates the variable x and assigns the value 14 to this variable. The variable only stores the calculated value, not the entire expression that generated that value.

Example 4 - Boolean Operators and Expressions

Note how each expression returns a Boolean value. These are called Boolean expressions.

Edit in PythonTutor

18.5. Strings and Printing

Besides creating and storing values in variables, we can also output text on a screen by calling the print() function.

Example 5 - Strings and print()

Try to predict the printed output. Look at the small window in the top-right as you use the Next button.

Edit in PythonTutor

18.6. If Statements

A block of code is a collection of lines of code that are either all executed (in sequential order) or all skipped. Blocks always start with a colon (:) on the previous line and require every line in the block to be indented the same amount using tabs or spaces. One way in which Python can execute or skip over a block involves using an if command and a Boolean expression. If the expression is true, then the block executes. Othewise, the block is skipped.

Example 6 - If Statements

Notice that all un-indented lines and the second block execute, while the first block does not execute.
Which blocks execute if age = 22? What about if age = 15?

Edit in PythonTutor

When you want to force exactly 1 of 2 blocks to execute (as opposed to just skipping a block), you can use the else command in addition to the if command. If the expression following the if command is true, then the first block executes. Otherwise, the second block executes.

Example 7 - If-Else Statements

No matter how you change the scores, only 1 print() function executes.
Try making the scores the same.

Edit in PythonTutor

In order to force exactly 1 of more than 2 blocks to execute, you can use the elif command in addition to the if and else commands. Each elif command must be followed by a Boolean expression. When using if and elif commands, each expression is checked in sequential order, and the block following the first true expression executes. If none of the expressions are true, the block following the else command is executed.

Example 8 - If-Elif-Else Statements

Even though several of the Boolean expressions are true when temp = 83, only the block after the first such expression executes.
Try several different values of temp and see what is printed.

Edit in PythonTutor

18.7. While Statements

Python can execute a block repeatedly using a while statement and a Boolean expression. The block repeats until the Boolean expression is false.

Example 9 - While Statements

What numbers do you think will print? Notice that, without line 3, the loop would run forever.

Edit in PythonTutor

The += operator increases the value of the variable written to the left of the operator by the value written to the right of the operator.

18.8. Lists and Loops

When you need to consider many values at once, use a list.

Example 10 - List Indexing

Try index -2.

Edit in PythonTutor

When you want to consider every value in a list, use a for loop.

Example 11 - For Loops With Indices

What does the variable i represent?
What line creates the variable i?
What line modifies it?

Edit in PythonTutor

The range() function returns a sequence of numbers. The sequence starts at the value given by the first argument, increments by 1, and ends at one less than the value given by the second argument. For example, range(2,5) returns 2,3,4. If only one argument is given, that argument is considered the second argument and the first argument is set to 0 by default. For example, range(4) returns 0,1,2,3.

Example 12 - For Loops Without Indices

What line creates the variable x?
What line modifies it?

Edit in PythonTutor

Example 13 - Summing with Loops

What line creates the variable g?
What line modifies it?

Edit in PythonTutor

18.9. List Appending and Slicing

We can append to lists with the concatenation operator (+). We can also slice a list using the bracket notation and two indices separated by a colon (:). The first index specifies the starting point of the slice while the second index specifies the stopping point of the slice + 1.

Example 14 - List Appending and Slicing

Try to predict the variable names, values, and data types in the code below. Use the Next button to check your answers.

Edit in PythonTutor

18.10. Lists versus Arrays

Python has both lists and arrays. Lists are convenient because the items in the list can be of different data types, but all items in an array must have the same data type. Arrays are more efficient because the items are stored in the array, but a list stores only a reference (or pointer) to the actual item. For the purposes of the MKD Remix, lists are preferred, but be aware that in some cases an array may be a better choice than a list. You can read more about arrays here.

18.11. Defining Custom Functions

In the examples above we have called several functions like print() and len(). You can define your own functions using def. A function definition includes zero or more parameter variables. The values of those parameter variables are referred to as the arguments of the function.

Example 15 - Defining Functions

What line creates the variables a and b?
When does that line execute?
How many times?
Where do the variables a and b get their values from?

Edit in PythonTutor

18.12. Exercises

Given the following Python code, what is the value and data type of each variable?

a = 6 + 8
large = a // 4
b = 22 // 3
c = 22 % 3
d = False or True
e = True and False
sheep = (True or (b > 10))

Given the following Python code, determine the printed output.

print("Hello World!")
a = "The answer is"
b = 6 * 7
print(a, b)
print(False, "Hobbit", 1, "Ring")

For the following code, determine the value of the variable letter when the score is 92, 84 and 59.

score = #an interger between 0 and 100
if score >= 90:
	letter = 'A'
elif score >= 80:
	letter = 'B'
elif score >= 70:
	letter = 'C'
elif score >= 60:
	letter = 'D'
else:
	letter = 'F'

For the following code, determine the value of the variable ans for each case given below.
```
if outside == False:
	if (n >= 2 and n <= 20):
		return ans = True
	else:
		return ans = False
else:
	if (n <= 2 or n >= 20):
		return ans = True
	else:
		return ans = False
```
1. n = 3, outside = False
2. n = 15, outside = False
3. n = 15, outside = True
4. n = 12, outside = True

What will this code print out?

while count > 0:
	print("Welcome")
	count -= 1

Write Python code to satisfy the following conditions. Then test your code on the values of the variables given.
1. Given an int n, return the absolute diffrence between n and 10, except return triple the absolute dfference if n is over 10. It should return 1 when n=9. It should return 33 when n=21. What will the code return when n=7 or n=35?
2. We have a loud talking robot. The "hour" parameter is the current hour time in the range 0 to 23. We are in trouble if the robot is talking and the hour is before 6 or after 21. Return True if we are in trouble. It should return True when the robot is talking and the hour is 8. It should return False when the robot is not talking and the hour is 8. What does it return if the robot is talking and the hour is 9?

What will the following code print out?

numbers = [1, 3, 5, 7, 10]
sq = 0
for val in numbers:
	sq = val * val
	print(sq)

What will the following code print out?
```
for i in range(1, 20, 2):
	print(i)
```

Use the following definition of the function front3() to find the output of the program for the list [1, 3, 5, 7].

def front3(nums):
	i = 0
	while (i < len(nums) and i < 5):
		if nums[i] == 3:
			return True
		i += 1
	return False

Write a function that takes, as input, two lists of integers, a and b, both of length 3, and returns, as output, a new list of length 2 containing the last elements of a and b. For example, if a = [1, 2, 3] and b = [10, 20, 30], then the function should return the list [3, 30].

19. Appendix: Python Syntax Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70 # the '#' character makes a COMMENT separating Python from English
x = 3 # create the VARIABLE with NAME x and STORE INT VALUE 3
Sebastien_Score = 9001 # variable names can be long, but no spaces!
y = 1.0 * 3 # y stores EXPRESSION's RETURN FLOAT value 3.0
z = "Hi There!" # z stores a STRING value
w = False # w stores a BOOLEAN value
v = [3, 30, "Hello World"] # v stores a LIST of values
print(z) # print function displays output ("Hi There!")

# Maths
a = 3
b = 3.0 # b stores 3.0 (float values are decimal approximations)
c = 7 // 2 # c stores 3 (int division always gives ints)
d = 7 % 2 # d stores 1 (Mod or Remainder of the division)
a = 5 # change the value of a to 5
a += 1 # INCREMENT the value of a by 1 (to 6)

# Boolean Operators
a = (3 > 2) # a stores True because 3 is greater than 2
a = (2 >= 2) # a stores True because 2 is greater than or equal to 2
a = (3 < 2) # a stores False because 3 is not less than 2
a = (2 <= 2) # a stores True because 2 is less than or equal to 2
a = (3 != 2) # a stores True because 3 is not equal to 2
a = (3 == 3) # a stores True because 3 is equal to 3
a = (True and False) # a stores False, AND returns True only when both sides are True
a = (True or False) # a stores True, OR returns True if at least 1 side is True
a = (not False) # a stores True, NOT returns opposite

# BLOCKS are sections of any code chunked together with INDENTATION
# BLOCKS start with a ':' and continue with each INDENTED line
x = 7
if x > 8: # if CONDITION is True, then execute block, otherwise skip block.
    print("Hello") # since x stores 7, this will skip
    print("I Am Sam.") # since x stores 7, this will skip
elif x > 2: # elif condition is True AND previous if was False, execute block
    print("Hi") # since x stores 7, this will execute
    print("I am Sally.") # since x stores 7, this will execute
else: # if all previous conditions are False, executer block.
    print("Yo") # since x stores 7, this will skip
    print("I'm Bob.") # since x stores 7, this will skip

while x > 3: # repeat a block until condition becomes False
    print("Apples")
    x += -1

# Lists store multiple values
a = [10, 30, 20, 90] # create a new list
x = len(a) # x stores 4 (the length)
b = a[0] # INDEX into the list, 0 is first value, b stores 10
c = a[3] # c stores 90
d = a[-1] # -1 is last value, d stores 90
e = a[1:3] # slice a from index 1 up to index 3, e stores [30, 20]
a[1] = 50 # modify the second element in the list, a is now [10, 50, 20, 90]
f = a + [5, 15] # f stores [10, 50, 20, 90, 5, 15], CONCATENATION not addition
g = range(0, 4) # range function returns list 0 up to 4, g stores [0, 1, 2, 3]

# For Loops
for c in "Elephant!": # repeat block with c storing each character 1 at a time
    print(c) # prints one letter per line

for x in [10, 30, 20]:
    print(x) # prints one number per line

# Custom Functions
def myfunc(a, b): # DEFINES a new function that takes 2 INPUT PARAMETER values
    c = 2 * a + b # executes only when function is called
    return c # RETURNS a value back to the calling code

x = myfunc(10, 5) # Calls the myfunc() function, x stores return value 25
y = myfunc(1, 3)  # Calls the myfunc() function, x stores return value 5

20. Appendix: For Instructors

CAUTION - CHAPTER UNDER CONSTRUCTION!

This chapter was last updated on August 24, 2025.
made minor wording changes

In this appendix, the remixer shares some ideas that other instructors may find useful.

20.1. Incorporating Lessons from the Discrete Mathematics Project

Here is how I, the remixer, have tried to use the “team-worthy” activity-based lesson materials developed by the Discrete Mathematics Project (DMP) in the course that I teach using the Remix.

It appears that one of the purposes of these lessons is to have each student play one specific role (either Documenter, Reporter, Facilitator, or Questioner) for their team during each lesson. The aim is to give students the experience of being assigned such a role, which may happen in their professional lives.

The order of the lessons in the list below reflects how the remixer has used them throughout a one-semester course.

Introduction to Teamworthy Tasks - Counting (DMP Lesson 1)	This lesson involves counting arguments and seems to have been designed as an accessible entry point for all students. However, in my experience, when teams are formed some students in each team may know how to solve the problems already while others may not, which can lead to the team members dropping out of the roles they are assigned to play during the activity and instead focusing on “getting the answers.” It may be worth considering using another activity with content that is more likely to be unfamiliar to almost all students. I may swap the content of this one with Handshake Lemma (That is, use the introductory materials about team-worthy tasks, but replace the activity content with Handshake lesson’s content).
Handshake Lemma (DMP Lesson 3)	This lesson can be used to introduce the concept of conjecture, lay the groundwork for proof, and provides a nonarithmetic/nonalgebraic example for proof by mathematical induction that is referred to later when mathematical induction is formally introduced. The graph theory content needed by students is minimal and can be learned just-in-time, so this lesson can be used very early in the course. NOTE: It may be important to mention to students that this activity does not involve literally shaking the hands of other people. A possible extension to this could be to use directed edges. Also, using this lesson as the first one in a semester may have an advantage over the counting problems included in the Introduction to Teamworthy Tasks lesson: The graph theory content is likely to be unfamiliar to most students so that team members may be more likely to adhere to their specific team roles.
Logic & Inference (DMP Lesson 5)	This lesson focuses on using rules of inference to make a valid argument, and analyzing invalid arguments (e.g., ones that include a converse or inverse error.) Some teams may try to use AI to solve the problem, which often brings up the opportunity to talk about common logical fallacies (i.e., the converse error and the inverse error.) In fact, the lesson includes examples from AI that have made these common errors.
Algorithms & Recursive Functions (DMP Lesson 7)	This lesson introduces algorithms and their analysis using recurrence relations and recursively-defined functions, and the comparison of the linear search and binary search algorithms for sorted lists. I use it to introduce recursion and recurrence relations, and the need to follow an algorithm carefully, step-by-step. I use this several weeks before formally introducing the search algorithms and their complexity analysis.
Binomial/Number Triangle (DMP Lesson 2)	I do this lesson after recurrence relations to try to focus on the relationships in the number triangle. Try to stress the reasoning needed to justify Pascal’s identity, as many students will focus on "getting the right numbers" instead.
Mathematical Induction (DMP Lesson 6)	Teams use an assembly-line approach to starting one proof, completing a second proof started by another team, then verifying a third proof completed by the second team. This activity works well with trios of teams completing each other’s proofs. In my experience, the student teams find the problems with algebraic equations the easiest (but not very easy at all.) Most undergraduate students have no experience using the kind of nesting arguments needed to work with inequalities. I’ve tried to use some "visual questions" (say, about properties of star graphs) but the teams would need much more practice with recursive definitions than they’ve had up to and including this course.
Ramsey/Dot Game (DMP Lesson 4)	I use this as the final activity, typically with only 3 students per team. The content involves graph edge colorings, presented as a game. I have found that the game context makes this lesson highly engaging. At this point in the semester, students are used to working in their team roles so this engagement does not lead them to abandon those roles.

\(p\)	\(q\)	\(AND\) &	\( \ OR\ \| \)	\(XOR\) \({}^{\wedge}\)	\(IF\) \(\Rightarrow\)	\(IFF\) \(\Leftrightarrow\)
1	1	1	1	0	1	1
1	0	0	1	1	0	0
0	1	0	1	1	1	0
0	0	0	0	0	1	1

\(A\)	\(B\)	\(C_{in}\)	\(\mathbf{S}\)	\(\mathbf{C_{out}}\)
1	1	1	\(\mathbf{1}\)	\(\mathbf{1}\)
1	1	0	\(\mathbf{0}\)	\(\mathbf{1}\)
1	0	1	\(\mathbf{0}\)	\(\mathbf{1}\)
1	0	0	\(\mathbf{1}\)	\(\mathbf{0}\)
0	1	1	\(\mathbf{0}\)	\(\mathbf{1}\)
0	1	0	\(\mathbf{1}\)	\(\mathbf{0}\)
0	0	1	\(\mathbf{1}\)	\(\mathbf{0}\)
0	0	0	\(\mathbf{0}\)	\(\mathbf{0}\)

Basis Step	Prove the predicate \(P(n)\) is True for some small value of \(n.\) In most but not all cases, you prove either \(P(0)\) or \(P(1).\) You can also prove \(P(n)\) for finitely-many other values if it helps you get a feel for what needs to be proven, as was done in “the sum of the first \(n\) consecutive odd natural numbers is the perfect square \(n^{2}\)” in the previous section.
Induction Step	Prove that the conditional statement \(P(k) \rightarrow P(k+1)\) is True for any integer \(k.\) In this context, the predicate \(P(k)\) is called the induction hypothesis and is assumed to be True, where \(k\) represents an arbitrary natural number. By assuming that \(P(k)\) is already True and also proving the conditional statement \(P(k) \rightarrow P(k+1),\) you can use modus ponens to infer that \(P(k+1)\) must also be True: You can “step up” from any natural number to the next largest natural number and maintain the truth value of the predicate.
Conclusion Step	Conclude that \(P(n)\) is True for all natural numbers \(n\) that are greater than or equal to the smallest value used in the Basis Step. Some sources do not list the Conclusion Step as part of a proof by mathematical induction, but the Remix includes it to emphasize that this step must be done to complete the proof.

Discrete Math - The MKD Remix (CSC230 Version)

About this text

How does the Remix differ from the original work?

Use of materials from the Discrete Mathematics Project

Alignment To Standards

Other Considerations in the Remix

About the use of Python in the Remix

Partial list of changes made (or to be made) to the Remix.

1. Introducing Discrete Mathematics

1.1. What is "Discrete Mathematics"?

1.2. To The Student: Some Things To Know Before You Begin

1.2.1. How To Use This textbook

1.2.2. Foundations

1.2.3. On-Demand Math Resources and Library Of Functions Appendices

1.2.4. Do I Need To Know How To Program In Python?

1.3. Applications of Discrete Mathematics

1.3.1. Applications to Applied Mathematics

1.3.2. Applications to Information Technology and Computer Science

1.3.3. Applications to Data Science

1.3.4. Applications to Engineering

1.3.5. Applications of Combinatorics

1.3.6. Applications of Graph Theory

1.3.7. Applications of Probability and Statistics

1.3.8. Applications to Social Sciences

1.4. Links to the Informal Definitions in this Chapter

2. Number Bases

2.1. Numbers, Numerals, and Digits

2.2. Review Of The Base-Ten Place Value System

2.2.1. An Algorithm That Computes The Digits Of A Base-Ten Numeral

2.3. The Base-Two Place Value System (Binary Notation)

2.3.1. An Algorithm That Computes The Digits Of A Base-Two Numeral

2.4. The Base-\(b\) Place Value System

2.4.1. An Algorithm That Computes The Digits Of A Base-\(b\) Numeral

2.4.2. Octal Notation (Base-8)

2.4.3. Hexadecimal Notation (Base-16)

2.4.4. A Theorem (To Be Proven Later)

2.5. Converting From Base-\(b\) to Base-Ten

2.6. Base Conversion Among Binary, Octal, and Hexadecimal

2.7. Exercises

3. Counting: Arithmetic Techniques

3.1. Some Foundational Counting Principles

3.1.1. The Sum Rule

3.1.2. The Subtraction Rule

3.1.3. The Product Rule

3.1.4. The Division Rule

3.2. The Pigeonhole Principle

3.3. Exercises

4. Set Theory

4.1. Sets

4.1.1. Describing A Set: The Roster Method

4.1.2. Describing A Set: Set Builder Notation

4.1.3. Describing A Set: Special Sets Of Numbers

4.1.4. Describing A Set: Switching Between Representations

4.2. Equality Of Sets

4.3. The Empty Set

4.4. Subsets of a Set

4.5. The Power Set of a Set

4.6. Cartesian Products

4.7. Cardinality Of Sets: Finite Sets

4.8. Venn Diagrams

4.9. Set Operations

4.9.1. Union

4.9.2. Intersection

4.9.3. Complement

4.9.4. Other Operations

Difference

Symmetric Difference

4.9.5. Multiple Set Operations

4.10. Set Identities

4.10.1. Operator Precedence (Order Of Operations)

4.11. Venn Diagrams, Partitions, and Bitstrings

4.11.1. Disjunctive Normal Form (Set Version)

4.12. The Principle Of Inclusion-Exclusion (PIE)

4.13. Cardinality Of Sets: Infinite Sets

4.13.1. Countable and Uncountable Sets

Infinite Cardinal Numbers

4.14. Exercises

5. Logic

5.1. Propositional Logic

5.2. Logical Operations and Truth Tables