Higher-order functions are functions that either take in other functions for their input, or return other functions for their output. Let's take a closer look at when they're useful, and how to write them.

So far we've written a lot of functions that take in numbers for their input. `square(x)`

and `nth_prime(n)`

are two good examples, which respectively take in the numbers `x`

and `n`

for their inputs.

There are also functions that are meant to take in functions for their input, rather than numbers. One example is the derivative function, from calculus. The derivative of a function, denoted mathematically like \frac{d}{dx}f(x), takes the function f(x) for its input and returns a different function that represents the rate of change in f(x). You could also hypothesize a function like `is_quadratic`

, which could take in a function f(x) for its input and return `True`

or `False`

depending on whether or not f(x) can be expressed in the form ax^2 + bx + c for some constants a, b, and c.

When you're writing a function that takes in another function for its input, you have to be mindful that the way we represent functions in math is different from the way we represent functions in computer science. In math, f is more or less interchangeable with f(x), and they both refer to a function. In computer science `f(x)`

is a function call on the argument `x`

. This function call evaluates to something like a number or a boolean or whatever. If you want to refer to the function itself, you have to write `f`

. Consider `square(4)`

for instance. It's a function call on the argument 4, and it evaluates to the number 16. The function itself is just `square`

.

So if we wanted to program the derivative function, which requires a function for its input, then we would have to use something like `f`

for the input rather than `f(x)`

. Here's an example:

```
def derivative(f):
"""Return a function representing the derivative of the function f."""
...
```

As you keep reading, just keep in mind the difference between `f`

and `f(x)`

. `f`

is a function, whereas `f(x)`

is a function call on the argument `x`

.

Suppose we have two functions, `f1`

and `f2`

, and we want to find out roughly how different they are from one another. To do that, we'll compare them on three inputs: `x`

, `y`

, and `z`

. Our goal is to find the average difference between `f1`

and `f2`

on those three inputs. Define the function `average_difference(f1, f2, x, y, z)`

to carry out this procedure. You may assume the function `average`

has already been defined for you.

```
def average_difference(f1, f2, x, y, z):
...
```

First let's find out how different `f1`

is from `f2`

, on the input `x`

. If you plotted `f1`

and `f2`

on a graph, then this would be the vertical distance between them at the number `x`

on the x-axis. We might consider the calculation `f1(x) - f2(x)`

. That's not quite right, though, because it could be negative if `f2(x)`

is bigger than `f1(x)`

, and we always want a nonnegative number when we're looking for the difference between two things. To solve this problem, we'll just use the built-in `abs`

function to take the absolute value of `f1(x) - f2(x)`

and make it positive.

```
def average_difference(f1, f2, x, y, z):
x_diff = abs(f1(x) - f2(x))
...
```

Then we can use similar calculations to find out how different `f1`

and `f2`

are from one another, on the inputs `y`

and `z`

.

```
def average_difference(f1, f2, x, y, z):
x_diff = abs(f1(x) - f2(x))
y_diff = abs(f1(y) - f2(y))
z_diff = abs(f1(z) - f2(z))
...
```

Now we know how different `f1`

is from `f2`

on the inputs `x`

, `y`

, and `z`

. But we set out to calculate how different `f1`

was from `f2`

*on average*, so we should take the average of these three differences.

```
def average_difference(f1, f2, x, y, z):
x_diff = abs(f1(x) - f2(x))
y_diff = abs(f1(y) - f2(y))
z_diff = abs(f1(z) - f2(z))
return average(x_diff, y_diff, z_diff)
```

If `average_difference(f1, f2, x, y, z)`

returns a very large number then `f1`

and `f2`

are very different, on average, when we compare them on the inputs `x`

, `y`

, and `z`

. Meanwhile if `average_difference(f1, f2, x, y, z)`

returns 0, then we know `f1(x) == f2(x)`

, `f1(y) == f2(y)`

and `f1(z) == f2(z)`

.

Now we've seen how it could be useful to have a function as an argument, when we're writing a procedure that's meant to operate on a function (or even multiple functions) rather than a number. You might also want to have a function as an argument when you're writing a versatile, general-use program. Let's talk about that some more.

Sometimes you will want a bunch of functions that are all variations on the same main idea. For example, you might want a function that computes the sum of the first n positive numbers, another that computes the sum of the first n squares, and a third that computes the sum of the first n cubes. These three functions are very similar, because they share the main idea of summing the first n terms in a sequence.

Though it may be tempting, it would be a very bad idea to write all three of these functions separately. Why? Let's see what it would look like:

```
def sum_numbers(n):
total = 0
while n > 0:
total += n
n -= 1
return total
def sum_squares(n):
total = 0
while n > 0:
total += square(n)
n -= 1
return total
def sum_cubes(n):
total = 0
while n > 0:
total += cube(n)
n -= 1
return total
```

In the end we have a lot more functions than we really need, and they all repeat basically identical logic. This makes our code super hard to maintain, for two reasons. First of all, it means that if we ever find an error we'll have to change a whole bunch of different functions. Second, it means that if we want to add another similar function, like `sum_fourth_powers`

or `sum_fifth_powers`

, then we would have to write even more functions and before you know it we have an entire library full of code that's all pretty much the same.

So to be clear, you should *never* write multiple functions that all share the same main idea. What's the alternative? Write one function that handles whatever your problems have in common. Then handle the rest with a function that you get as input. We might come up with something like this:

```
def sum_sequence(n):
total = 0
while n > 0:
total += ____
n -= 1
return total
```

Notice how this template fits all three of the functions we wrote above. Depending on how we fill in the blank, it could become `sum_numbers`

, `sum_squares`

, or `sum_cubes`

. We'll use a parameter to specify how we want to fill in the blank:

```
def sum_sequence(n, func):
total = 0
while n > 0:
total += func(n)
n -= 1
return total
```

When we pass in the function `square`

for the parameter `func`

, then `sum_sequence`

becomes identical to `sum_squares`

. When we pass in the function `cube`

for the parameter `func`

, then `sum_sequence`

becomes identical to `sum_cubes`

. In fact, now we can sum any sequence at all — not just the the first n positive numbers, squares, or cubes! Here's an example, assuming `square(n)`

, `cube(n)`

, `fourth_power(n)`

, and `factorial(n)`

are all defined:

```
>>> sum_sequence(3, square) # 3**2 + 2**2 + 1**2
14
>>> sum_sequence(3, cube) # 3**3 + 2**3 + 1**3
36
>>> sum_sequence(3, fourth_power) # 3**4 + 2**4 + 1**4
98
>>> sum_sequence(3, factorial) # 3! + 2! + 1!
9
```

Suppose we want a function that finds (x^2)^2, another function that finds \exp(\exp(x)), and a third function that finds \frac{d}{dx}\frac{d}{dx}f(x). These three functions share the main idea of applying an operator twice to their input. This is called the order-2 composition of a function. Here's how we might go about solving these problems naively, assuming `square`

, `exp`

, and `derivative`

are already defined:

```
def compose_square(x):
return square(square(x))
def compose_exp(x):
return exp(exp(x))
def compose_derivative(x):
return derivative(derivative(x))
```

But this is quickly getting out of hand. Every time we want to get the order-2 composition of another function, we'll have to write a new version of `compose`

for it. Like we saw earlier with the `sum_numbers`

, `sum_squares`

, and `sum_cubes`

functions, we're just making more work for ourselves in the future. Let's try consolidating these three functions into just one.

```
def compose(x):
...
```

For our first step, let's fill in `compose`

with all the stuff in common between `compose_square`

, `compose_exp`

, and `compose_derivative`

.

```
def compose(x):
return ____(____(x))
```

Now we have a template that can fit `compose_square`

, `compose_exp`

, and `compose_derivative`

depending on how we fill in the blank. To determine what needs to go in the blank, let's add another parameter. We'll call it `func`

.

```
def compose(func, x):
return func(func(x))
```

If we pass in the function `square`

for `func`

, then we get the same behaviour as `compose_square`

. If we pass in the function `exp`

for `func`

, then we get the same behaviour as `compose_exp`

. And if we pass in the function `derivative`

for `func`

, then we get the same behaviour as `compose_derivative`

. In fact, we can find the order-2 composition of any function at all! We just have to call `compose`

, passing in whatever function we want for the parameter `func`

. This makes it a very versatile and useful function.

```
>>> compose(square, 2) # (2**2) ** 2
16
>>> compose(add_2, 7) # (7 + 2) + 2
9
```

**Warning**: This section includes the answers to some of the practice problems from the chapter on iteration. If you intend to do those practice problems, and you haven't yet, then you may want to skip this section.

There are a lot of really weird and complicated functions in mathematics, like e^x and \sin(x). Sometimes, rather than working with these functions directly, math folk prefer to approximate them with very big polynomials. (A polynomial is a function that can be written using powers of the same variable, multiplied by constants. Every polynomial can be written like a_nx^n + a_{n-1}x^{n-1} + \cdots + a_0x^0, for some choice of constants a_n, a_{n-1}, \ldots a_0.) Whenever you approximate a complicated function using a big polynomial, that's the Taylor series of the function.

In particular, this is the Taylor series for e^x: \sum_{i=0}^{\infty}\frac{x^i}{i!}=\frac{x^0}{0!}+\frac{x^1}{1!}+\frac{x^2}{2!}+\frac{x^3}{3!}+\frac{x^4}{4!}+\cdots

And this is the Taylor series for \sin(x): \sum_{i=0}^{\infty}(-1)^i\frac{x^{2i+1}}{(2i+1)!}=\frac{x^1}{1!}-\frac{x^3}{3!}+\frac{x^5}{5!}-\frac{x^7}{7!}+\frac{x^9}{9!}-\cdots

Since both of these Taylor series are infinitely long polynomials, you can't compute the whole thing for either of them. Instead, most math folk just take the first n terms in a Taylor series and that usually works good enough for whatever calculation they're doing. So pretend for a moment that we're cool math folk, and we've written the two functions below:

```
def taylor_approximate_exp(x, n):
"""Return our best approximation for exp(x), using the first n terms in its Taylor series."""
total, i = 0, 0
while i < n:
total += (x ** i) / factorial(i)
i += 1
return total
def taylor_approximate_sin(x, n):
"""Return our best approximation for sin(x), using the first n terms in its Taylor series."""
total, i = 0, 0
while i < n:
is_neg = i % 2 == 1
ith_odd = 2 * i + 1
total += is_neg * (x ** ith_odd) / factorial(ith_odd)
i += 1
return total
```

Let's try consolidating them into just one function.

```
def taylor_approximate(x, n):
...
```

We'll start, as always, by filling it in with the stuff in common between `taylor_approximate_exp`

and `taylor_approximate_sin`

.

```
def taylor_approximate(x, n):
total, i = 0, 0
while i < n:
total += ____
i += 1
return total
```

Now we have a template that can fit `taylor_approximate_exp`

as well as `taylor_approximate_sin`

, depending on how we fill in the blank. As with our previous examples, we'll add a parameter to determine what needs to go there. Since this parameter is a function that computes the ith term in the Taylor polynomial, we'll call it `ith_term`

. Looking at our code above, we can see that the stuff in this blank depends on both `i`

and `x`

, so we should provide them both to `ith_term`

function.

```
def taylor_approximate(ith_term, x, n):
total, i = 0, 0
while i < n:
total += ith_term(i, x)
i += 1
return total
```

Depending on what `ith_term`

does with `i`

and `x`

, this code could behave like `taylor_approximate_exp`

or `taylor_approximate_sin`

. Now we just need some functions to pass in for `ith_term`

.

We'll start with the one that computes the ith term in the Taylor polynomial for e^x. Referring back to `taylor_approximate_exp`

and `taylor_approximate_sin`

, we should arrive at something like this:

```
def ith_exp_term(i, x):
"""Return the ith term in the Taylor polynomial for exp(x)."""
return (x ** i) / factorial(i)
def ith_sin_term(i, x):
"""Return the ith term in the Taylor polynomial for sin(x)."""
is_neg = i % 2 == 1
ith_odd = 2 * i + 1
return is_neg * (x ** ith_odd) / factorial(ith_odd)
```

Now `taylor_approximate(ith_sin_term, 3, 10)`

will compute our best approximation for \sin(3) using the first 10 terms of the Taylor polynomial for \sin(x). Similarly, `taylor_approximate(ith_exp_term, 3, 10)`

will compute our best approximation for \exp(3) using the first 10 terms of the Taylor polynomial for \exp(x).

```
>>> taylor_approximate(ith_sin_term, 3, 10) # Approximately sin(3).
4.938377459671376
>>> taylor_approximate(ith_exp_term, 3, 10) # Approximately exp(3).
20.063392857142855
```

And if we ever want to find the Taylor approximation for a new function, like \cos(x) or \ln(1+x), then we can still use our general-purpose `taylor_approximate`

function. All we'd have to do is define a real short and simple function (like `ith_cos_term`

or whatever) to get the ith term in the Taylor polynomial that we desire.

The main takeaway from all this? Don't write a bunch of functions that do basically the same stuff. Instead just make one, and delegate the details to a function you get in your input. Often times, if you want to make your code more versatile, that's the way to do it. Make sure you understand everything we've covered so far, before you keep reading.

One day you might need to write a program whose output has to be a function. Once again, consider writing the derivative function. Since the derivative of a function is a function, you'd need `derivative(f)`

to return a function.

You may someday also want a function that's really hard or tedious to write, like something that computes the order-n composition of a function. Then, rather than writing the function you want, you might find it easier to write a function that produces the function you want.

These are just two examples of times when you'll need to write a function whose job is first of all to create a new function, and second to return that function. We'll start by learning how these functions work.

As you might guess, we can define a function within another function by using a `def`

statement. For example, here's some code that defines a function `f`

and then calls `f`

. When this happens, it first defines a function `g`

within the frame for `f`

, and then returns 5. The corresponding pyagram is below. Note that we get an error when we try looking for `g`

in the global frame, since `g`

is only defined in frame 1.

```
>>> def f():
... def g():
... return 5
... return 5
...
>>> f()
5
>>> g
Error: 'g' is not defined
```

Recall from the chapter on functions, the parent frame of a function is the frame where it's defined. In the example above, for instance, notice how `f`

is defined in the global frame so it's parent is global, but `g`

is defined within frame 1 so it's parent is frame 1.

Parent frames are important because they tell you where to look up variables. So for example, if you need to know what `x`

is but it's not in your current frame, then you look for it in the parent frame. If you still don't see it there, then look for `x`

in your parent frame's parent frame. And if you don't see it there either, then you go to that one's parent frame, and so on until you reach the global frame. If you don't find `x`

even after searching the global frame, then Python throws an error. To review the basics of pyagrams, parent frames, and variable lookup, refer back to the chapter on functions.

Now I could stop talking about parent frames right here, but I want to make sure you really understand how they work. In my experience a lot of students get the misconception that the parent of a function is the frame where it's called. **To be totally clear, that's wrong. The parent of a function is the frame where it's defined.** This example will illustrate the importance of that difference:

```
>>> def f():
... x = 4
... def g():
... return x
... return g
...
>>> x = 8
>>> h = f()
>>> h()
4
```

Let's draw the corresponding pyagram. Again, it may help to review the procedure for drawing pyagrams, from the chapter on functions.

The takeaway is that the parent of a function is where it's defined, not where it's called.

Now that we know how Python handles functions that define and return other functions, let's discuss how to approach writing them. I've seen a lot of students try to write higher-order functions line by line, in order from start to end. These students usually end up kind of frustrated and confused. And frankly, that makes sense. You're not supposed to write code like that. Instead, you should approach it layer by layer.

What do I mean by that? Pretend you're writing a function that's meant to have another function defined inside it. For now, just focus on the outer function. While you write it, ignore the function that's supposed to be defined inside. (Feel free to write the `def`

statement, but no more.) Just pretend the inner function is already done, and you're only being asked to write the outer function. Then, *after* you're done with that and you're confident it works, you can write the inner function.

The reasoning here is that you don't want to get caught up writing two functions at once. It can get very confusing, if you're constantly switching between writing the outer function and the inner function. Things are a lot easier if you just focus on one function at a time.

It might also help to consider how you want your function call to behave, when you're done. Use these questions to help guide you:

- What should you get from calling your function? If it's supposed to return a function, then what does that one return when you call it?
- What information does your outer function need? Where's it get that information from? Is it the parameters, or somewhere else?
- What information does your inner function need? Where's it get that information from? Is it the parameters, or does it look for information in the outer function's frame? (Recall from earlier, how parent frames and variable lookup work.) Or does it perhaps get the information it needs from somewhere else?

Define `get_avg_func(f, g)`

, which takes two functions `f`

and `g`

for its input, and outputs a function of `x`

whose output is half way in between `f(x)`

and `g(x)`

for all numbers `x`

. You may assume the function `average`

has already been defined for you.

```
def get_avg_func(f, g):
...
```

First of all, we know we want `get_avg_func(f, g)`

to return a function. It sounds like a good idea for us to define a function that we can return! Let's do that. We'll call it `avg_func(x)`

, and it'll return a value half way in between `f(x)`

and `g(x)`

for all numbers `x`

.

```
def get_avg_func(f, g):
def avg_func(x):
...
...
```

But remember, for now we're going to focus on writing `get_avg_func`

. We'll only move on to `avg_func`

after `get_avg_func`

is complete. So, what else does `get_avg_func`

need to do? According to the problem description, it's supposed to return the function whose output is half way in between `f(x)`

and `g(x)`

for all numbers `x`

. In other words, it has to return `avg_func`

. Let's add that to our code.

```
def get_avg_func(f, g):
def avg_func(x):
...
return avg_func
```

And now we're all done writing `get_avg_func`

! All that's left is to write `avg_func`

, so that it actually works like we want it to. For it to return a value half way in between `f(x)`

and `g(x)`

, we just need to find the average of those two values. Luckily we are allowed to use the `average`

function, so let's just use that.

```
def get_avg_func(f, g):
def avg_func(x):
return average(f(x), g(x))
return avg_func
```

Now try drawing a pyagram for the code below. Assume `average(x, y)`

directly returns `(x + y) / 2`

.

```
>>> avg_sq_cu = get_avg_func(square, cube)
>>> avg_sq_cu(3) # The average of 3**2 and 3**3.
18.0
```