Summer of learning: Week 1
2025 May 25
Unfortunately, I crossed the International Date Line sometime during the week, gaining half a day (and then losing three or four days to jet lag). This is a great excuse to be lazy.
$\S$1. Goals for next week
- I suppose now I should just finish Colley, which seems doable considering how much time it took to get through chapters 1 and 2 (a bit heavier than I expected...)
- Still on chapter 5 of LADR (oops).
- Did not even start reading Tao Analysis (too tired for ts ngl)
- It turns out french waves uses $j$ for the imaginary unit which is uh... oh well. That is all...
- usaco: if i can just start on like binary search after finishing prefix sums that would be nice ngl (skill issue) (internet issue)
$\S$2. Notes on Vector Calculus (Colley):
$\S$1.1. Limits
To define the derivative in functions of several variables, we must first generalize the limit to multiple dimensions.
So, given a function $f:{\mathbb R}^n\to {\mathbb R}^m$, we may define:
Definition 1.
We let
\[ \lim_{x\to a} f(x)=L \]
if we can take $||f(x)-L||$ to be arbitrarily small for small enough (nonzero) $||x-a||$.
Notice that this definition fails more often than in the one-variable case.
It requires that, along every path approaching $a$, the value of $f(x)$ approaches $L$.
Of course, one can then define continuity as $\lim_{x\to a}f(x)=f(a)$, which is exactly the same thing except $x$ and $a$ are vectors instead.
$\S$1.2. Derivatives
First, the partial derivative:
Definition 2.
The partial derivative of $f:{\mathbb R}^n\to {\mathbb R}$ with respect to a variable $x_i$, denoted $ \frac{\partial f}{\partial x_i}$, or $D_{x_i}f$, or $f_{x_i}$, is what you get taking the derivative with respect to $x_i$ holding all other variables constant.
That is, it is the derivative of $F(x_i)=f(x_1,\dots,x_i,\dots,x_n)$ where $F:{\mathbb R}\to {\mathbb R}$, or if you prefer,
\[ \lim_{h\to 0}\frac{f(\dots,x_i+h,\dots) - f(\dots,x_i,\dots)}{h}. \]
In general, for functions $f:{\mathbb R}^n\to {\mathbb R}^m$, derivatives are now represented by the Jacobian matrix
\[ Df=\begin{bmatrix} \vdots &&&\\\frac{\partial f_i}{\partial x_1} & \frac{\partial f_i}{\partial x_2}&\dots&\frac{\partial f_i}{\partial x_n}\\ \vdots \end{bmatrix}, \]
which we will motivate here.
Because we can no longer speak of the derivative as a “slope”, we consider instead functions which are good linear approximators, since this is what the derivative is in one dimension anyway.
For the case of two dimensions:
Definition 3.
A function $f:{\mathbb R}^2\to{\mathbb R}$ is differentiable at $(a,b)$ if there exists a function $h:{\mathbb R}^2\to {\mathbb R}$ such that
\[ \lim_{(x,y)\to (a,b)}\frac{f(x,y)-h(a,b)}{||(x,y)-(a,b)||}=0. \]
Furthermore, $z=h(x,y)=f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(x-b)$ is then the equation of the tangent plane to $f$ at $(a,b)$.
To spoil the point a little bit: it turns out that $h$ is a good approximator for $f$ at $a$ if
\[ h=f(a)+\nabla f(a)\cdot (x-a) = f(a)+Df(a)(x-a). \]
So in general we say
Definition 4.
A function $f:{\mathbb R}^n\to{\mathbb R}^m$ is differentiable at $a$ if there exists a function $h:{\mathbb R}^n\to {\mathbb R}^m$ such that
\[ \lim_{x\to a}\frac{f(x)-h(a)}{||x-a||}=\lim_{x\to a}\frac{f(x)-(f(a)+Df(a)(x-a))}{||x-a||}=0. \]
$\S$1.3. Higher order partial derivatives
One can show that the multi-variable derivative shares many properties with the single-variable derivative, that is,
Proposition 5.Let $f,g:{\mathbb R}^n\to {\mathbb R}^m$ be two functions differentiable at $a$, and $c$ be a scalar. Then:
- The function $h=f+g$ is differentiable at $a$, and $Dh(a)=Df(a)+Dg(a)$,
- The function $k=cf$ is differentiable at $a$, and $Dk(a)=cDf(a)$,
- The product of $f$ and $g$ is differentiable at $a$, and
\[ D(fg)(a)=g(a)Df(a)+f(a)Dg(a), \]
- If $g(a)\neq 0$, then
\[ D(f/g)(a)=\frac{g(a)Df(a)-f(a)Dg(a)}{g(a)^2}. \]
Theorem 6. (Clairaut’s Theorem)
If $f$ is a function such that $f_x, f_y, f_{xy}, f_{yx}, f_{yy}, f_{xx}$ all exist, then
\[ \frac{\partial^2}{\partial x\partial y} f =\frac{\partial^2}{\partial y\partial x} f. \]
Proof.
We would like to evaluate both sides of the equation at $(a,b)$, assuming all derivatives exist there. Consider the function
\[ D(\Delta x, \Delta y)=f(a+\Delta x, b+\Delta y)-f(a+\Delta x,b)-f(a,b+\Delta y)+f(a,b). \]
We can see $D(\Delta x,\Delta y)$ as the difference in $f$ vertically or horizontally, as so:
\begin{align*}
D(\Delta x,\Delta y)&=(f(a+\Delta x, b+\Delta y)-f(a+\Delta x,b))-(f(a,b+\Delta y)-f(a,b)) \\
&= F(b+\Delta y)-F(b) \\
&= F’(c)\Delta x \qquad \text{(Mean value theorem)}
\end{align*}
where $a\le c\le a+\Delta x$.
Then, $F’(c) = f_{x}(c,b+\Delta y)-f_{x}(c,b)=f_{xy}(c, d)\Delta y$ for some $b\le d\le b+\Delta y$, once again by mean value theorem. Thus
\[ f_{xy}(a,b)=\lim_{(\Delta x,\Delta y)\to (0,0)} \frac{D(\Delta x, \Delta y)}{\Delta x\Delta y}. \]
A similar argument shows that $f_{yx}(a,b)$ equals the same limit.
$\square$
$\S$1.4. Chain rule
Proposition 7.
For two functions $f:{\mathbb R}^n\to {\mathbb R}^m$ and $g:{\mathbb R}^m\to {\mathbb R}^k$, differentiable at $a\in {\mathbb R}^n$ and $x\in {\mathbb R}^m$ respectively,
\[ D(g \circ f)(a)=Dg(x)Df(a). \]
I think this formulation is very reasonable, because it’s sort of like regular matrix multiplication: Each matrix describes a linear transformation between two vector spaces, and then taking the composition is multiplication.