Summer of learning: Week 1

2025 May 25

Unfortunately, I crossed the International Date Line sometime during the week, gaining half a day (and then losing three or four days to jet lag). This is a great excuse to be lazy.

$\S$1. Goals for next week

I suppose now I should just finish Colley, which seems doable considering how much time it took to get through chapters 1 and 2 (a bit heavier than I expected...)
Still on chapter 5 of LADR (oops).
Did not even start reading Tao Analysis (too tired for ts ngl)
It turns out french waves uses $j$ for the imaginary unit which is uh... oh well. That is all...
usaco: if i can just start on like binary search after finishing prefix sums that would be nice ngl (skill issue) (internet issue)

$\S$2. Notes on Vector Calculus (Colley):

$\S$1.1. Limits

To define the derivative in functions of several variables, we must first generalize the limit to multiple dimensions. So, given a function $f:{\mathbb R}^n\to {\mathbb R}^m$, we may define:

Definition 1.
We let \[ \lim_{x\to a} f(x)=L \] if we can take $||f(x)-L||$ to be arbitrarily small for small enough (nonzero) $||x-a||$.

Notice that this definition fails more often than in the one-variable case. It requires that, along every path approaching $a$, the value of $f(x)$ approaches $L$. Of course, one can then define continuity as $\lim_{x\to a}f(x)=f(a)$, which is exactly the same thing except $x$ and $a$ are vectors instead.

$\S$1.2. Derivatives

First, the partial derivative:

Definition 2.
The partial derivative of $f:{\mathbb R}^n\to {\mathbb R}$ with respect to a variable $x_i$, denoted $ \frac{\partial f}{\partial x_i}$, or $D_{x_i}f$, or $f_{x_i}$, is what you get taking the derivative with respect to $x_i$ holding all other variables constant. That is, it is the derivative of $F(x_i)=f(x_1,\dots,x_i,\dots,x_n)$ where $F:{\mathbb R}\to {\mathbb R}$, or if you prefer, \[ \lim_{h\to 0}\frac{f(\dots,x_i+h,\dots) - f(\dots,x_i,\dots)}{h}. \]

In general, for functions $f:{\mathbb R}^n\to {\mathbb R}^m$, derivatives are now represented by the Jacobian matrix \[ Df=\begin{bmatrix} \vdots &&&\\\frac{\partial f_i}{\partial x_1} & \frac{\partial f_i}{\partial x_2}&\dots&\frac{\partial f_i}{\partial x_n}\\ \vdots \end{bmatrix}, \] which we will motivate here. Because we can no longer speak of the derivative as a “slope”, we consider instead functions which are good linear approximators, since this is what the derivative is in one dimension anyway. For the case of two dimensions:

Definition 3.
A function $f:{\mathbb R}^2\to{\mathbb R}$ is differentiable at $(a,b)$ if there exists a function $h:{\mathbb R}^2\to {\mathbb R}$ such that \[ \lim_{(x,y)\to (a,b)}\frac{f(x,y)-h(a,b)}{||(x,y)-(a,b)||}=0. \] Furthermore, $z=h(x,y)=f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(x-b)$ is then the equation of the tangent plane to $f$ at $(a,b)$.

To spoil the point a little bit: it turns out that $h$ is a good approximator for $f$ at $a$ if \[ h=f(a)+\nabla f(a)\cdot (x-a) = f(a)+Df(a)(x-a). \] So in general we say

Definition 4.
A function $f:{\mathbb R}^n\to{\mathbb R}^m$ is differentiable at $a$ if there exists a function $h:{\mathbb R}^n\to {\mathbb R}^m$ such that \[ \lim_{x\to a}\frac{f(x)-h(a)}{||x-a||}=\lim_{x\to a}\frac{f(x)-(f(a)+Df(a)(x-a))}{||x-a||}=0. \]

$\S$1.3. Higher order partial derivatives

One can show that the multi-variable derivative shares many properties with the single-variable derivative, that is,

Proposition 5.
Let $f,g:{\mathbb R}^n\to {\mathbb R}^m$ be two functions differentiable at $a$, and $c$ be a scalar. Then:

The function $h=f+g$ is differentiable at $a$, and $Dh(a)=Df(a)+Dg(a)$,
The function $k=cf$ is differentiable at $a$, and $Dk(a)=cDf(a)$,
The product of $f$ and $g$ is differentiable at $a$, and \[ D(fg)(a)=g(a)Df(a)+f(a)Dg(a), \]
If $g(a)\neq 0$, then \[ D(f/g)(a)=\frac{g(a)Df(a)-f(a)Dg(a)}{g(a)^2}. \]

Theorem 6. (Clairaut’s Theorem)
If $f$ is a function such that $f_x, f_y, f_{xy}, f_{yx}, f_{yy}, f_{xx}$ all exist, then \[ \frac{\partial^2}{\partial x\partial y} f =\frac{\partial^2}{\partial y\partial x} f. \]

Proof. We would like to evaluate both sides of the equation at $(a,b)$, assuming all derivatives exist there. Consider the function \[ D(\Delta x, \Delta y)=f(a+\Delta x, b+\Delta y)-f(a+\Delta x,b)-f(a,b+\Delta y)+f(a,b). \] We can see $D(\Delta x,\Delta y)$ as the difference in $f$ vertically or horizontally, as so: \begin{align*} D(\Delta x,\Delta y)&=(f(a+\Delta x, b+\Delta y)-f(a+\Delta x,b))-(f(a,b+\Delta y)-f(a,b)) \\ &= F(b+\Delta y)-F(b) \\ &= F’(c)\Delta x \qquad \text{(Mean value theorem)} \end{align*} where $a\le c\le a+\Delta x$. Then, $F’(c) = f_{x}(c,b+\Delta y)-f_{x}(c,b)=f_{xy}(c, d)\Delta y$ for some $b\le d\le b+\Delta y$, once again by mean value theorem. Thus \[ f_{xy}(a,b)=\lim_{(\Delta x,\Delta y)\to (0,0)} \frac{D(\Delta x, \Delta y)}{\Delta x\Delta y}. \] A similar argument shows that $f_{yx}(a,b)$ equals the same limit. $\square$

$\S$1.4. Chain rule

Proposition 7.
For two functions $f:{\mathbb R}^n\to {\mathbb R}^m$ and $g:{\mathbb R}^m\to {\mathbb R}^k$, differentiable at $a\in {\mathbb R}^n$ and $x\in {\mathbb R}^m$ respectively, \[ D(g \circ f)(a)=Dg(x)Df(a). \]

I think this formulation is very reasonable, because it’s sort of like regular matrix multiplication: Each matrix describes a linear transformation between two vector spaces, and then taking the composition is multiplication.