Monday, June 20, 2022

Recent Questions - Mathematics Stack Exchange

Recent Questions - Mathematics Stack Exchange


Why this interaction is happening in multiple regression?

Posted: 20 Jun 2022 09:14 PM PDT

I attach R code.

  b<-rnorm(9999,0,999)  s<-rnorm(9999,0,1)    

This has nearly 0 correlation.

and I can make a y with this b, and s.

y <- s + b      

(I omit error variance so that I Can see what's going on clearly.)

  summary(lm(y~s+b+s:b))    

the interaction should not be sig, but it is significant. why it happened?

Is free algebra a PI-algebra?

Posted: 20 Jun 2022 09:12 PM PDT

We take a nonempty set $\mathbf{X}$ and write $\mathbf{X}=\{X_i\mid i\in I\}$. A finite sequence of elements from $\mathbf{X}$ will be denoted by $X_{i_1}X_{i_2}\cdots X_{i_m}$ and called a word. Note that the empty sequence is not excluded, and we denote it by $1$ and call it the empty word. Let us define multiplication by juxtaposition, that is, $$(X_{i_1}X_{i_2}\cdots X_{i_m})\cdot(X_{j_1}X_{j_2}\cdots X_{j_n})=X_{i_1}X_{i_2}\cdots X_{i_m}X_{j_1}X_{j_2}\cdots X_{j_n},$$ and the set of all words becomes a monoid with unity $1$, which we denote by $\mathbf{X}^\ast$ called the free monoid on $X$. For convenience, we can write $X_i^n$ for $X_iX_i\cdots X_i$ ($n$ times). Accordingly, every $w\neq1$ in $\mathbf{X}^\ast$ can be written as $w=X_{i_1}^{k_1}X_{i_2}^{k_2}\cdots X_{i_r}^{k_r}$, in which $i_j\in I$ and $k_j\in\mathbb{N}$. The free algebra on $\mathbf{X}$ over $F$, denoted by $F\langle \mathbf{X}\rangle$, is the monoid algebra of $\mathbf{X}^\ast$ over $F$. The elements in $\mathbf{X}$ and $F\langle \mathbf{X}\rangle$ are called indeterminates and noncommutative polynomials, respectively. A nonzero noncommutative polynomial $$f=f(X_1,X_2,\cdots,X_n)\in F\langle X_1,X_2,\cdots\rangle$$ is called polynomial identity of an $F$-algebra $A$ if $$f(a_1,a_2,\cdots,a_n)=0$$ for all $a_1,a_2,\cdots,a_n\in A$, and $A$ is called PI-algebra.

Question. Is free algebra a PI-algebra?

I do not even find an answer to this question. I also wonder if we can prove by contradiction. However, I get no results. Please help me.

Is someone able to solve this double integral?

Posted: 20 Jun 2022 09:11 PM PDT

I've been trying to solve this integral but I simply can't find the change of variables that makes it "easy". I'd really appreciate the help.

enter image description here

Suppose the nth partial sum of a series ∞ Qn=1 an is Sn = n − 2 n + 6 . Find a formula for an. Does the series converge?

Posted: 20 Jun 2022 09:02 PM PDT

Divergence Test: If limn→∞ an ≠ 0, then the series ∑∞n=1 an diverges.

Function $y=f(x)$ having two points of discontinuity

Posted: 20 Jun 2022 09:01 PM PDT

If function $y = f(x)$ has only two points of discontinuity say $x_1, x_2$, where $x_1, x_2 < 0$, then the number of points of discontinuity of $f(-|x+1|)$  is_____

I checked this question no other information is provided, this appeared in a sample paper of a top coaching, how do we proceed

Trace of positive definite matrix trick

Posted: 20 Jun 2022 08:59 PM PDT

I'm reading a paper and came across the following simplification

$$ \text{tr}(A^TBA) = \sum_{i}a_i^TBa_i $$

For a positive definite matrix $AA^T$ and orthonormal matrix $B$. Would anyone kindly show how this is the case?

Question about Largest Singular of a Graph

Posted: 20 Jun 2022 08:49 PM PDT

Suppose $G$ is a simple graph of order $n$ with $m$ edges. Let $\sigma_1\geq \sigma_2\geq \cdots \geq\sigma_n$ be the singular values of the adjacency matrix of the graph. The graph energy is defined by $\mathcal{E}(G)=\sigma_1+\cdots +\sigma_n$. Is it true that $\sigma_1< \mathcal{E}/2$ ? Is there a sharper bound?

On the End Point Cases for Sobolev inequalities

Posted: 20 Jun 2022 08:36 PM PDT

Suppose that we have a sequence $(f_{N})$ of compactly supported smooth functions such that \begin{align*} \int_{\mathbb{R}^{n}}|f_{N}(x)|dx&\leq C<\infty,\\ \int_{\mathbb{R}^{n}}|\nabla f_{N}(x)|dx&\leq C<\infty \end{align*} for some constant $C>0$ independent of all such $f_{N}$, can we deduce that \begin{align*} \int_{\mathbb{R}^{n}}(1+|x|)^{a}|f_{N}(x)|dx\leq C_{a}<\infty \end{align*} for some $a>0$, and $C_{a}>0$ is a constant independent of $f_{N}$?

At least, I hope to get \begin{align*} \int_{\mathbb{R}^{n}}\log(1+|x|)|f_{N}(x)|dx\leq C<\infty. \end{align*}

Sobolev inequalities won't help since this is the end point case for $L^{1}$.

The evaluation of the integration of absolute value of second derivative $f^{''}(x)$.

Posted: 20 Jun 2022 08:35 PM PDT

Let $f$ be a function on $[0, C]$ with continuous second derivative. Suppose $M=\max_{x\in [0,C]}f(x)$ and $m=\min_{x\in [0,C]}f(x)$, then $M-m\leq T\int_{0}^{T}|f^{''}(x)|dx$.

My thought: Since $T\int_{0}^{T}f^{''}(x)dx\leq T\int_{0}^{T}|f^{''}(x)|dx$, we have $T(f'(T)–f'(0))\leq T\int_{0}^{T}|f^{''}(x)|dx$, how to check that $M-m\leq T(f'(T)–f'(0))$?

Fisher information of sum of random variables

Posted: 20 Jun 2022 08:41 PM PDT

I am working on the Fisher information of a density $\rho$, defined as: $I(\rho) = \int \left\lVert \nabla \log \rho(x) \right\rVert^2 \rho(x) \mathrm{d} x$. I couldn't find much resources on this integral. Can anyone suggest some notes for me?

I have two random variables with densities: $X \sim \rho_0(x), Y \sim \rho_1(y)$. I am studying the sum of these two variables: $Z = X + Y$. $Z$ will follow another density $p(z)$. What is the relationship between the Fisher information of these three variables? That is, what is the relationship between $I(p), I(\rho_0), I(\rho_1)$.

If it is possible to derive a property about $I(p)$, it would also be great.

Thank you.

Derivative of an iterative vector function?

Posted: 20 Jun 2022 08:21 PM PDT

I am trying to solve the following problem

I have some vector-matrix product of the form $\textbf{y} = \textbf{w} \cdot \textbf{X}(\textbf{x})$. Here, $\textbf{x}=[x_1, x_2, ... , x_N]$ and the matrix $ \textbf{X}(\textbf{x}) $ takes the form: $$ \textbf{X}(\textbf{x}) = \begin{bmatrix} 1 & 1 & ... & 1 \\ x_1 & x_2 & ... & x_N \\ u^{(1)}_1 & u^{(2)}_1 & ... & u^{(N)}_1 \\ \vdots & \vdots & & \vdots \\ u^{(1)}_n & u^{(2)}_n & ... & u^{(N)}_n \end{bmatrix} = \begin{bmatrix} 1 & 1 & ... & 1 \\ x_1 & x_2 & ... & x_N \\ \textbf{u}^{(1)} & \textbf{u}^{(2)} & ... & \textbf{u}^{(N)} \end{bmatrix} $$

The vectors $\textbf{u}$ are found iteratively: $$\textbf{u}^{(i)} = f(x_{i-1},\textbf{u}^{(i-1)})$$ with $\textbf{u}^{(1)} = \textbf{0}$. The row-vector $\textbf{w}$ is a vector of constants and does not depend on $\textbf{x}$. I am wanting to find $$\frac{d\textbf{y}}{d\textbf{x}} = \frac{d}{d\textbf{x}} \left( \textbf{w} \cdot \textbf{X}(\textbf{x}) \right)$$ but am troubled on how one should handle the vectors $\textbf{u}$ since they are iterative. Any help would be greatly appreciated. If I am not making sense and should clarify anything please do let me know haha.

Thanks!

An infinitely axiomatizable class of structures whose reduct is finitely axiomatizable, and vice versa

Posted: 20 Jun 2022 08:37 PM PDT

Let $L$ and $L'$ be first-order languages such that $L' \subseteq L$. Does there exist a class $K$ of $L$-structures which is axiomatizable but not finitely axiomatizable, such that the $L'$ reduct of $K$ is finitely axiomatizable? Also, what about vice versa? That is, does there exist a class $K$ of $L$-structures which is finitely axiomatizable, such that the $L'$ reduct of $K$ is axiomatizable but not finitely axiomatizable?

Is the definition of covariant / contravariant basis as partial derivatives / gradient consistent with the Kronecker delta of vector spaces and dual?

Posted: 20 Jun 2022 09:18 PM PDT

If I am not messing it up the vector basis (covariant) $\{{\bf e}_i\}_{i=1}^n$ and the dual basis (contravariant) $\{{\bf e}^i\}_{i=1}^n$ are related through the Kronecker delta:

$${\bf e}^i({\bf e}_j)=\delta^i_j$$

I have been asking about Pavel Grinfeld's use of $\bf Z^i$ and $\bf Z_i$ as a symbol for these basis vectors and dual, and I don't want to insist asking on his approach in general given the feedback received on my prior post: the idea of orthogonality and same-space did not seem to resonate as a good (or correct) pedagogical tool.

However, I have found this presentation online which seems to parallel to some extent PG's approach, including the use of $Z$ as notation, yet it departs at perhaps at a critical point. I want to know whether this different, but not completely unrecognizable, presentation is solid:

The idea here is that the ambient space is Euclidean, so there is a position vector, and a partial derivative of this position vector with respect to each of the coordinates, so as to generate a set of basis vectors.

What is interesting is that the "other set of basis vectors" (I presume the dual vectors) is plausibly defined as the gradient of the coordinate vectors.

So for polar,

$$\begin{align} {\bf e}_1&=\frac{\partial R}{\partial r} &= \cos\theta \hat i + \sin \theta \hat j\\ {\bf e}_2&=\frac{\partial R}{\partial \theta} &= -r\sin\theta \hat i + r\cos \theta \hat j\\ {\bf e}^1&=\nabla\sqrt{r^2 \cos^2\theta+ r^2 \sin^2\theta} = \frac{2r\cos\theta}{2\sqrt{r^2\cos^2\theta+r^2\sin^2\theta}} \hat i + \frac{2r\sin \theta }{2\sqrt{r^2\cos^2\theta+r^2\sin^2\theta}}\hat j\\ {\bf e}^2&=\nabla\arctan \frac{\sin\theta}{\cos\theta} = -\frac{\sin\theta}{r^2} \hat i + \frac{\cos \theta}{r^2} \hat j\\ \end{align}$$

which indeed fulfill ${\bf e}^i({\bf e}_j)=\delta^i_j$ once these vectors and convectors are normalized.

The question is twofold:

  1. With vector basis and their dual defined this way, will this relationship always be preserved ${\bf e}^i({\bf e}_j)=\delta^i_j$ beyond this example of polar coordinates? Or only if the basis vectors happen to be orthogonal to each other?
  2. Is this a way of introducing the concept of vectors /covectors consistent with the understanding of different spaces? Is it a good / generalizable definition?

WHY is $\operatorname{supp}(τ_hf) =h + \operatorname{supp}(f)$ ??

Posted: 20 Jun 2022 08:43 PM PDT

Why is $\operatorname{supp}(τ_hf) =h + \operatorname{supp}(f)$?

Let $f\colon \mathbb R^n\to\mathbb R$ be a continuous function. Define $τ_hf$ by $(τ_hf)(x)=f(x-h)$ translation of $f$ by $h$.

Prove $\operatorname{supp}(τ_hf) =h + \operatorname{supp}(f)$

I'm trying to understand why this happens.

I am trying to prove by double inclusion. Let $A =\operatorname{supp}(τ_hf)$ and $B = h + \operatorname{supp}(f)$ .

I took an element in $A$ and I can't get that this element is also in $B$. (In similar way if I take an element in $B$ I can't get to $A$.) I don't know if I'm going the right way in this prove.

I would really appreciate if someone could help me with this prove or give me some idea how to do it.

Thanks

Why is $ 2 $ not in the range of $a^2+5b^2$, where $a$ and $b$ are integers?

Posted: 20 Jun 2022 08:04 PM PDT

I'm posting this doubt because this user is not active

Question :

Show if $2$ and $1+\sqrt{-5}$ belong to the same principal ideal $I$ of $\mathbb{Z}[\sqrt{-5}]$ then $I=\mathbb{Z}[\sqrt{-5}]$.

I have some confusion on this answer

It is written that

  • Observe that we have a multiplicative function $ N: \Bbb{Z}[\sqrt{-5}] \to \Bbb{N}_{0} $ defined by $$ \forall (a,b) \in \Bbb{Z}^{2}: \quad N(a + b \sqrt{-5}) \stackrel{\text{df}}{=} a^{2} + 5 b^{2}. $$
  • Suppose that $ \{ 2,1 + \sqrt{-5} \} \subseteq \langle x \rangle $ for some $ x \in \Bbb{Z}[\sqrt{-5}] $.
  • As $ N(2) = 4 $, we have $ N(x) | 4 $, and as $ N(1 + \sqrt{-5}) = 6 $, we have $ N(x) | 6 $.
  • Hence, $ N(x) \in \{ 1,2 \} $.
  • It is easily verified, however, that $ 2 $ is not in the range of $ N $, so we must have $ N(x) = 1 $.
  • Therefore, $ x = \pm 1 $, which yields $ \langle x \rangle = \mathbb{Z}[\sqrt{-5}] $.

My confusion:why $ 2 $ is not in the range of $ N $?

My thinking : Here $\mathbb{N_0}$ denote the set of positive intger

If $N(x)|4 $ and $N(x)|6$ then $N(x)|24 \implies 2|24$

Therefore $2$ must be in the range of $N$

Prove that the sequence $a, f(a),\cdots$ contains at least one square of an integer

Posted: 20 Jun 2022 08:33 PM PDT

Let $f(n) = n+\lfloor \sqrt{n}\rfloor$. Prove that for every positive integer $a$, the sequence $a, f(a),\cdots$ contains at least one square of an integer.

Can someone explain why in the solution below, $f^r(m)$ is a square for some r with $0\leq r\leq 2j$? Why is it $2j$ instead of $j$? From my understanding, the "excess" strictly decreases by 1 provided it's positive.

enter image description here

Adjacency matrix for soccer ball (football)?

Posted: 20 Jun 2022 09:19 PM PDT

Does anyone have a ready reference to the adjacency matrix for the truncated icosahedron (soccer/football)?

(In my retirement, I'm reviewing some of my college maths, and adjacency matrices have recently captured my attention.)

The facebook pages to the Institute of Mathematics, Gurgaon include beautiful work on the platonic solids and other polyhedra, including these images (for example), captured in these screenshots:

tetrahedron

icosahedron

dodecahedron

tesseract

Finding a Curve in a Vector Field that has 0 Flow

Posted: 20 Jun 2022 08:17 PM PDT

I am going through Vector Calculus for the first time and I had a thought which I assume has a neat solution, but I could not find a good answer for it:

Imagine I have given 2D vector field, I want to draw the curve that at every moment has 0 flow.

You can imagine in the case of F(x,y) = x ihat + y jhat, that this curve would be any circle, of any radius, either cw or ccw.

is there a generalized solution for this problem, or some reference to how to derive such curves?

Any help is appreciated!

Not following the derivation of $\frac{dy}{dx}=-\frac{F_x}{F_y}$

Posted: 20 Jun 2022 08:13 PM PDT

I've seen similar questions to mine asked on the forum, but I haven't seen answers that address the part I'm confused about.

My calculus textbook (Thomas from Pearson) derives the following formula to "take some of the algebra out of implicit differentiation":

Suppose the function $F(x,y)$ is differentiable and the equation $F(x,y)=0$ defines $y$ implicitly as a differentiable function of $x$. Then at any point where $F_y\neq 0$, we have $$\frac{dy}{dx}=-\frac{F_x}{F_y}$$.

(The formula itself is pretty intuitive to me, except for the negative sign.) I feel like I am misinterpreting the derivation given, as it seems to be using $F(x,y)$ to denote two different functions and treating them as if they are the same. The derivation goes like this:

Suppose that (1) the function $F(x,y)$ is differentiable and that (2) the equation $F(x,y)=0$ defines $y$ implicitly as a differentiable function of $x$. Since $w=F(x,y)=0$, the derivative $\frac{dw}{dx}$ must be zero.

As I understand this, they are defining a new function $w:\{(x,y):F(x,y)=0\}\rightarrow\{0\}$, a level curve of the original $F(x,y)$, which is zero everywhere on its domain, and we're to suppose that its domain defines $y$ implicitly in terms of $x$. But then they continue:

... Computing the derivative [of the equation $w=F(x,y)=0$] from the chain rule, we find $$0=\frac{dw}{dx}=F_x\frac{dx}{dx}+F_y\frac{dy}{dx}=F_x+F_y\frac{dy}{dx}.$$ Therefore, we have $$\frac{dy}{dx}=-\frac{F_x}{F_y}.$$

This is where I get confused. In the example questions, it is clear that $F_x$ and $F_y$ denote the partial derivatives of the original function $F(x,y)$ of which $w$ is a level curve. But this use of the chain rule seems to assume that those are also the partials of w (which is a constant function, and should have zero derivatives, no?). I'm interpreting this as a special case of $$\frac{dw}{dt}=\frac{\partial w}{\partial x}\frac{dx}{dt}+\frac{\partial w}{\partial y}\frac{dy}{dt}$$ where $t=x$, and where $\frac{\partial w}{\partial x}$ and $\frac{\partial w}{\partial y}$ are written as $F_x$ and $F_y$. But I'm not seeing how the former and the latter partials are equivalent. Why can we assume both that $\frac{dw}{dx}=0$ and that $F_x=\frac{\partial w}{\partial x}$, when $F_x$ is not zero in general? Or is that assumption not actually being made by using the chain rule this way? What am I missing or getting wrong here? I'd really appreciate if someone would set me on the right track so that I can get some intuition for why this theorem works. Thanks!

Integral $\int_{-\infty}^{\infty}\frac{x^2\cos(x)}{x^4-1}dx$

Posted: 20 Jun 2022 08:41 PM PDT

I want to compute $$\int_{-\infty}^{\infty}\frac{x^2\cos(x)}{x^4-1}dx$$ by using Residue Theorem. The poles are given by $A:=\{-i,i,1,-1\}$. Now, it seems to me that semicircle contour won't work, because $1,-1\in[-R, R], R>1.$ Or does it help me if I make a curve around $-1,1$ with my cycle? I also thought about using the integrand $f(z)=\frac{z^2e^{i z}}{z^4-1}$ and taking the real part in the end.

$\frac{1}{n}$ is the exact threshold probability for a random graph $\Gamma(n,p)$ to be planar

Posted: 20 Jun 2022 08:13 PM PDT

As far as I know, $\hat{p} = \frac{1}{n}$ is the exact threshold probability for a random graph $\Gamma(n,p)$ to be planar, i.e. if $p = p(n) \geq (1 + \varepsilon) \hat{p}$ for some $\varepsilon > 0$, then $\mathbb{P}(\Gamma(n,p) \text{ is planar}) \to 0$ as $n \to \infty$ and if $p = p(n) \leq (1 - \varepsilon) \hat{p}$ for some $\varepsilon > 0$, then $\mathbb{P}(\Gamma(n,p) \text{ is planar}) \to 1$ as $n \to \infty$. But I don't fully understand how to accurately prove it.

Suppose $p = p(n) \leq (1 - \varepsilon) \hat{p}$. As long as planarity is a decreasing property, it is enough to prove $\mathbb{P}(\Gamma(n,p) \text{ is planar}) \to 1$ for the case, when $pn \to (1-\varepsilon) < 0$. It is known that if $pn \to (1-\varepsilon) < 0$, then the probability of the components of $\Gamma(n,p)$ being only trees and unicyclic graphs tends to $1$. But trees and unicyclic graphs are planar, so $\mathbb{P}(\Gamma(n,p) \text{ is planar}) \to 1$. Is this argument correct?

Suppose now that $p = p(n) \geq (1 + \varepsilon) \hat{p}$. In this case, as far as I know, for any $r>0$ the probability of $\Gamma(n,p)$ containing a complete graph with $r$ vertices ($K_r$) tends to $1$. I don't know, how to prove it, but if it is true, then $\mathbb{P}(\Gamma(n,p) \text{ is planar}) \to 0$, as long as $K_5$ is not planar. Is there a simpler way to prove it? As far as I know, in this case if we assume additionally that $np \to (1+\varepsilon) > 0$ then with the probability tending to $1$ the size of the largest component of $\Gamma(n,p)$, devided by $n$, tends to $\beta$ in probability, where $\beta$ is the solution of the equation $\beta + e^{-(1 + \varepsilon)\beta} = 1$. Is this statement somehow applicable to my question?

Any help would be appreciated.

Sequences with Tetrahedral Stacks of Oranges

Posted: 20 Jun 2022 09:12 PM PDT

A grocer stacks oranges into a tetrahedral pyramid (i.e, a pyramid with a triangular base and 3 triangular sides). Let Cn be the total number of oranges where n is the height (number of levels) in the pyramid. (C1 = 1, C2 = 4, C3 = 10, etc.). Give a closed-form formula for Cn and report the value for C100.

This is what I have tried so far but I am unsure how correct this is:

Tn = (n * (n + 1) * (n + 2)) / 6

The proof uses the fact that the nth tetrahedral the number is given by,

Trin = (n * (n + 1)) / 2

It proceeds by induction.

Base Case

T1 = 1 = 1 * 2 * 3 / 6

Inductive Step

Tn+1 = Tn + Trin+1

Tn+1 = [((n * (n + 1) * (n + 2)) / 6] + [((n + 1) * (n + 2)) / 2]

Tn+1 = (n * (n + 1) * (n + 2)) / 6

Scaling Invariant Functions for Weights

Posted: 20 Jun 2022 08:12 PM PDT

Let $n\in\mathbb{N}$ and $w_i\in\mathbb{R}^{+}$ for all $1\leq i\leq n$. Consider $\left(w_{1},\dots,w_{n}\right)\in\mathbb{R}^{n}$.

What are necessary and sufficient conditions for functions $f:\mathbb{R}^{+}\to\mathbb{R}^{+}$ so the following holds?

(Desired Property): For all $c\in \mathbb{R}^{+}$ there exists $\lambda \in \mathbb{R}^{+}$ such that $$\left(f\left(cw_{1}\right),\dots,f\left(cw_{n}\right)\right)=\lambda\left(f\left(w_{1}\right),\dots,f\left(w_{n}\right)\right).$$


For context: these $w_i$ are weights, and I'm looking for functions applicable to the weights that are independent of scaling.

For example, the function $\exp:\mathbb{R}^{+}\to\mathbb{R}^{+}$ doesn't have this property, since $\left\{ e^1,e^2\right\}\neq \lambda \left\{ e^{2},e^{4}\right\} $ for all $\lambda \in \mathbb{R}^{+}$.

A sufficient condition is $f\left(xy\right)=k f\left(x\right)f\left(y\right)$ for some $k\in \mathbb{R}^{+}$ and all $x,y\in \mathbb{R}^{+}$.

Is this condition also necessary? How would one prove this / what's a counter-example? I'd appreciate any help finding the conditions, preferably explicit conditions on the functions.

MPC trajectory optimization intuition

Posted: 20 Jun 2022 08:57 PM PDT

Introduction

I've been a struggling to create some intuition around trajectory optimization for MPC. I'd really, really like some help in creating that intuition, whether it be some pointers, some literature recommendations, some videos, anything. Something to help me build an understanding as to how these problems are actually optimized.

Context

I am a dumb dual electrical engineer/applied math recent graduate, so if possible I'd really appreciate some tangible intuition into how the optimization process works for trajectory optimization.

I've done some courses on numerical linear algebra and optimization during my studies, so I'm familiar with how 2nd order methods like L-BFGS are defined and implemented (and even know some tricks like implicit formulation of the hessian to save memory).

However I am having trouble understanding the intuition behind open loop optimisation methods like trajectory optimization.

Every tutorial on the internet I've looked, when they get to the step of optimizing their NLP problem, uses CasADi, which often just wacks an NLP formatted problem into its magic solve() function and hey presto you've got your optimal trajectory.

I understand that the tools in CasADi are really popular and powerful, and the people who came up with the algorithms that can solve these problems are amazing. In a production environment, I'd probably use CasADi as a solver as well, since its well researched and robust.

However, I'd really like to understand the intuition as to how these problems are solved fundamentally before I go taking advanced tools like CasADi for granted.

Toy problem

I'll elaborate with a toy problem I came up with.

I'll be opting for a Direct transcription method as its easier to solve (or so I've heard).

Toy problem picture

Here I've shown a classic system used to introduce chaotic motion to undergrads doing math degrees.

It's a polar system that is inherently stable.

Given some point ($r$,$\theta$), I want to determine the optimal trajectory to the stable ring ($r^2 = \lambda$).

We denote the position of the point (or state) as $x_k$

Since $\dot{r}$ is dependent on $\lambda$, as our point moves colser to the stable ring, each step size will become smaller (as the change in $r$ will become smaller and smaller as we move closer to $\lambda$).

This might not be optimal, as I may have a time constraint for when the point should be within some small margin around the stable ring.

We could control the system using MPC, denoting the control state at time $k$ as $u_k$, and giving it the ability to change the value of $\lambda$ at each timestep.

This way the point could be moved to the original position of $\lambda$ much faster.

fleshing out problem detail picture

Here I've fleshed out the control problem in a bit more detail to show my understanding.

$x^r$ and $u^r$ are the target state and resting control state respectively.

In order to achieve an optimal trajectory, we need to minimize some cost function.

Here I've denoted the cost function, $J_N(x,u)$ as the sum of all the running costs $l(x_k,u_k)$.

This cost function allows for balancing of the state error with control error (i.e. we want a snappy response, but we might want to limit how harsh the changes in controller input can be).

Where the issue arises

I'm aware of two methods to solve this problem

  • Shooting methods (based on simulation, better for problems with simple control and no path constraints)
  • Collocation methods (based on function approximation, better for problems with complicated controls and/or path constraints)

I had a go at formulating an implementation for the single shooting method as I heard it was the easiest.

shooting method picture

Here $U^T$ denotes the control vector, and $X^T$ the state vector.

Question now is, what do I do from here?

$X^T$ is based on the prior states of $x_k$, hence the problem has an infinite number of solutions.

Trajectory optimization, according to what I've read, provides an open loop solution to the problem, meaning a global picture of the system is not required in order to achieve some minimum.

But implementing this on a computer is not straight forward.

Do I "guess" what the path should be?

When I put said path into the cost function, how do I determine how the path should change in order to be less wrong (as changing one segment would require re-computing the other segments)?

I have no idea how collocation works, and most of the tutorials online aren't super clear about how the idea works.

Please I'd really appreciate some pointers, advice, literature recommendations, anything. Something to help me see how this wonderful idea works.

Kind regards,

Despicable_B

EDIT 1:

After some more research, I think I now understand the difference between shooting and (direct) collocation methods.

Shooting Methods:

Shooting methods optimize over the set of control states, and the state trajectory is implicitly determined from those control states.

  • The upside is that shooting methods tend not to violate the dynamics of the problem, as the dynamics are an implicit constraint (in the context of the toy problem, this means that the point will obey the motions of the vector field in a sequential manner).

  • The downside is, shooting methods will have extreme difficulty converging for non-linear, chaotic systems (as a small change in input will lead to a disproportional change in output, and follow a chaotic path).

(Direct) Collocation method:

Rather than optimize over the control states, the collocation method optimizes over the evaluation states, and from that optimal path, determining the controls implicitly.

  • The upside is this means that each point is only dependent on a limited number of previous states (making it easier to optimise)

  • The downside is, the dynamics of the system are an explicit constraint, and if not properly configured, can lead to solutions that violate the dynamics of the system (for a robot, this might mean the robot thinks it can walk through walls).

Interestingly, this lecture from Berkeley asserts that second order methods are much better suited to this kind of problem, as they're better at solving with multiple constraints simultaneously.

The lecturer gives some examples and specifically mentions the Gauss-Newton method. I'll continue studying the problem for now, will be back with more updates.

EDIT 2:

Sorry for the delay, I managed to figure out a how to implement the shooting method and solve my problem though. The results speak for themselves:

enter image description here

Of course it doesn't look like much, this system is very stable by itself and thus there really isn't much that can be done to optimize it.

But I did learn a bunch about what is required to make this work, and more broadly why we're seeing things like self driving cars, boston dynamic's amazing robots, and space X's self landing rockets pop up in recent years.

You'll recall that I mentioned solvers like CasADi were a bit of an oddity to me, as I didn't quite understand what black magic made their methods work.

Low and behold, the answer was in the title all along, Automatic Differentiation.

After reading through the implementation of the collocation method by fedja, I noticed the use of the finite differences method, which got me thinking, "What does machine learning use to determine the descent direction for the millions of parameters in modern neural networks?".

Ari Seff does an absolutely magnificent job explaining how the technology works, (those who're curious about understanding Backpropagation in ML from a more general perspective should definitely give it a watch).

Essentially, the "primatives" of your loss function (i.e. the dot product, various for loops, euclidean norms, etc) all can be broken down into a "tree" which can be traversed (either backwards or forwards) to get the derviative with respect to any of your inputs.

The loss function can even be MIMO (multi-input, multi-output), and you can even compute individual rows or columns of the jacobian or hessian, allowing for memory constraints to be factored into your optimisation algorithm.

This lead me to the JAX python package (a research project funded by Google I believe), that allows python code to be accelerated on either GPUs or TPUs using a JIT (just in time) compiler. This compiler will perform the automatic differentiation for us, allowing us to use whatever methods we like in order to optimise it (I used stochastic gradient descent for this one because it's the easiest to understand).

Hence, even a complicated loss function like the one shown above can be optimized using Auto-div (and even accelerated on modern GPU / TPU hardware).

EDIT 3:

Here is the collocation method results as well.

enter image description here

It was computed using the optimisation method as the shooting method, and with the same Q and R weights.

I'll put together a github project with the python code used to generate the graphs (as well as a readme to explain the intuition behind the methods, their pros and cons, and where to go from here if you want to do more with this tech). It should be out soon.

Closing comments

After completing this problem, I can see how the ML lead tech-boom, Space-X's self landing rockets and self-driving cars came into existence.

The capability gained from having a computer systematically process any complex function (i.e. compute code) and generate its gradient, jacobian and hessian is a critical point in the history of the development of the applied math / computer science fields and should be appreciated in all of its glory.

Acknowledgements

I could not have figured this out without the support of fedja, thank you so much for your insight, help and patience, you're truly a marvel to behold.

Evaluate/asymptotic of the sum $\sum_{a = 1}^{L \left({N}\right)} \left({- 1}\right)^{\left\lfloor{N/\left({2\, a + 1}\right)}\right\rfloor}$

Posted: 20 Jun 2022 07:55 PM PDT

I am trying to evaluate the sum and its asymptotic limit as $N \rightarrow \infty$ of

$$\sum_{a = 1}^{L \left({N}\right)} \left({- 1}\right)^{\left\lfloor{N/\left({2\, a + 1}\right)}\right\rfloor}$$

where the upper limit can be either $L \left({N}\right) = {L}_{1} \left({N}\right) = \left\lfloor{\frac{N}{6} - \frac{1}{2}}\right\rfloor$ or $L \left({N}\right) = {L}_{2} \left({N}\right) = \left\lfloor{\frac{1}{2} \left\lceil{\sqrt{N + 2}}\right\rceil}\right\rfloor-1$.

It looks like I should count the number of even and odd terms of $$\left({- 1}\right)^{\left\lfloor{N/\left({2\, a + 1}\right)}\right\rfloor}$$

The case for ${L}_{2} \left({N}\right)$ when $N \rightarrow \infty$ looks to be $\sim \sqrt{N}$. The similar case for ${L}_{1} \left({N}\right) \sim \alpha\, N$ for some constant $\alpha$.

My literature search has not revealed any number theory equivalents nor anything for the asymptotic limits. I suspect that the sum equivalent of the sum of the number of divisors or $\sum_{n = 1}^{x} \left\lfloor{\frac{N}{n}}\right\rfloor$ has been broken down into the even and odd floor function results, but again I am not finding any literature on this subject.

These sums arise from computing the exact number of factorable quartics of the form $\left({x+a}\right)^{2} \left({x+b}\right) \left({x+c}\right)$ where $a, b, c$ are integer roots.

Is this elementary proof correct

Posted: 20 Jun 2022 08:35 PM PDT

Theorem:

If:

$$x^p=y^p+z^p$$ $$\gcd(x,y,z)=1$$ $$p \nmid xyz$$

Then:

$$x^{p^l} \equiv y^{p^l}+z^{p^l} \pmod{p^{l+1}}$$

for any $l$.

Proof:

Assume:

$$q_p(x) \equiv q_p(y) \equiv q_p(z) \equiv 0 \pmod{p}$$

where $q_p(x)$ is the Fermat-quotient of $x$ modulo $p$.

Let:

$$\frac{x^{p+1}-y^{p+1}}{x-y}=(p+1)y^p+a(x-y) \equiv x+y \pmod{p^2}$$ $$\frac{x^{p+1}-z^{p+1}}{x-z}=(p+1)z^p+b(x-z) \equiv x+z \pmod{p^2}$$

Note:

$$\implies az \equiv by \equiv x \equiv y+z\pmod{p}$$ $$\implies ab \equiv a+b \pmod{p}$$

And:

$$az \equiv x-py \pmod{p^2}$$ $$by \equiv x-pz \pmod{p^2}$$ $\implies$ $$z(a-1) \equiv y(1-p) \pmod{p^2}$$ $$y(b-1) \equiv z(1-p) \pmod{p^2}$$

$\implies $

$$(a-1)^{p-1} \equiv (b-1)^{p-1} \equiv (1-p)^{p-1} \equiv 1+p \pmod{p^2}$$

Also note:

$$a^p-(a-1)^p-1 \equiv \left(\frac{x}{z}\right)^p-\left(\frac{y}{z}\right)^p -1 \equiv 0 \pmod{p^2}$$ $$b^p-(b-1)^p-1 \equiv \left(\frac{x}{y}\right)^p-\left(\frac{z}{y}\right)^p -1 \equiv 0 \pmod{p^2}$$

Now let $n_1=a-1$ and note:

$$a^p-(a-1)^p-1=(n_1+1)^p-n_{1}^{p}-1 \equiv 0 \pmod{p^2}$$ $$n_{1}^{p-1} \equiv (a-1)^{p-1} \equiv 1+p \pmod{p^2}$$

Consider the following algorithm:

BEGIN

$$(n_1+1)^p-n_{1}^{p}-1 \equiv 0 \pmod{p^2}$$ $$n_{1}^{p-1} \equiv (a-1)^{p-1} \equiv 1+p \pmod{p^2}$$

Note:

$$\frac{(n_{1}+1)^p-1}{n_1+1-1}=p+n_2n_1 \equiv n_{1}^{p-1} \equiv (a-1)^{p-1} \equiv 1+p \pmod{p^2}$$

for some $n_2$.

$$\implies n_2n_1 \equiv 1 \pmod{p^2}$$ $$\implies n_{2}^{p-1} \equiv \frac{1}{n_1^{p-1}} \equiv \frac{1}{1+p} \equiv 1-p \pmod{p^2}$$

Also:

$$n_1 \equiv \frac{1}{n_2} \pmod{p} \implies n_{1}^{p} \equiv \frac{1}{n_2^{p}} \pmod{p^2}$$ $$n_1+1 \equiv \frac{1+n_2}{n_2} \pmod{p} \implies (n_{1}+1)^p \equiv \frac{(n_2+1)^p}{n_{2}^{p}} \pmod{p^2}$$

$$\implies (n_1+1)^p-n_{1}^{p}-1 \equiv \frac{1}{n_{2}^{p}}\left((n_2+1)^p-n_2^p-1\right) \equiv 0 \pmod{p^2}$$ $$\implies (n_2+1)^p-n_{2}^{p}-1 \equiv 0 \pmod{p^2}$$ $$n_{2}^{p-1} \equiv 1-p \pmod{p^2}$$

END

Repeating this will give us:

$$n_{1}^{p-1} \equiv 1+p$$ $$n_{2}^{p-1} \equiv 1-p$$ $$n_{3}^{p-1} \equiv 1+3p$$ $$n_{4}^{p-1} \equiv 1-5p$$ $$n_{5}^{p-1} \equiv 1+11p$$ $$...$$ $$n_k^{p-1} \equiv 1+\frac{p}{3}\left(1-(-2)^k\right) \pmod{p^2}$$

$$\implies q_p(n_k) \equiv \frac{1}{3}\left(1-(-2)^k\right) \pmod{p}$$

Note that when the same algorithm is used for $b^p-(b-1)^p-1 \equiv 0 \pmod{p^2}$ with $m_1=b-1$ and $k=p-1$, then:

$$n_k^p \equiv n_k \equiv (b-1)^p \equiv b^p-1 \pmod{p^2}$$ $$m_k^p \equiv m_k \equiv (a-1)^p \equiv a^p-1 \pmod{p^2}$$ $\implies$

$$(n_k+1)^p \equiv n_k+1 \equiv b^p \pmod{p^2}$$ $$(m_k+1)^p \equiv m_k+1 \equiv a^p \pmod{p^2}$$ $\implies$

$$(n_km_k)^p \equiv 1 \pmod{p^3}$$ $$n_k^p \equiv (b-1)^{p^2} \pmod{p^3}$$ $$m_k^p \equiv (a-1)^{p^2} \pmod{p^3}$$ $$(n_k+1)^p \equiv b^{p^2} \pmod{p^3}$$ $$(m_k+1)^p \equiv a^{p^2} \pmod{p^3}$$

$\implies$

$$(n_km_k)^p+n_k^p+m_k^p+1 \equiv (a-1)^{p^2}+(b-1)^{p^2}+2 \equiv (ab)^{p^2} \pmod{p^3}$$ $$\implies (ab)^{2p^2} \equiv (ab)^{p^2}(a-1)^{p^2}+(ab)^{p^2}(b-1)^{p^2}+2(ab)^{p^2} \pmod{p^3}$$

Note:

$$b^{p^2}(a-1)^{p^2} \equiv a^{p^2} \pmod{p^3}$$ $$a^{p^2}(b-1)^{p^2} \equiv b^{p^2} \pmod{p^3}$$

And so:

$$\implies (ab)^{2p^2} \equiv a^{2p^2}+b^{2p^2}+2(ab)^{p^2} \pmod{p^3}$$ $$\implies (ab)^{2p^2} \equiv (a^{p^2}+b^{p^2})^2 \pmod{p^3}$$ $$\implies (ab)^{p^2} \equiv a^{p^2}+b^{p^2} \pmod{p^3}$$ $\implies$ $$a^{p^2}(b^{p^2}-1) \equiv b^{p^2} \equiv a^{p^2}(b-1)^{p^2} \pmod{p^3}$$ $$b^{p^2}(a^{p^2}-1) \equiv a^{p^2} \equiv b^{p^2}(a-1)^{p^2}\pmod{p^3}$$ $\implies$

$$a^{p^2}-(a-1)^{p^2}-1 \equiv b^{p^2}-(b-1)^{p^2}-1 \equiv 0 \pmod{p^3}$$

Now if we set as parameters:

$$n_1=(a-1)^p$$ $$a^{p^2}-(a-1)^{p^2}-1 \equiv (n_1+1)^{p^2}-n_1^{p^2}-1 \equiv 0 \pmod{p^3}$$ $$(a-1)^{p(p-1)} \equiv 1+p^2 \pmod{p^3}$$

we can repeat the algorithm again:

$$q_p(n_k) \equiv q_p(m_k) \equiv \frac{1}{3}\left(1-(-2)^k\right) \pmod{p^2}$$

If we keep repeating we get:

$$q_p(n_k) \equiv q_p(m_k) \equiv \frac{1}{3}\left(1-(-2)^k\right) \pmod{p^2}$$ $$q_p(n_k) \equiv q_p(m_k) \equiv \frac{1}{3}\left(1-(-2)^k\right) \pmod{p^3}$$ $$q_p(n_k) \equiv q_p(m_k) \equiv \frac{1}{3}\left(1-(-2)^k\right) \pmod{p^4}$$ $$...$$ $$q_p(n_k) \equiv q_p(m_k) \equiv \frac{1}{3}\left(1-(-2)^k\right) \pmod{p^l}$$

And so eventually:

$$(ab)^{p} \equiv a^{p}+b^{p} \pmod{p^2}$$ $$(ab)^{p^2} \equiv a^{p^2}+b^{p^2} \pmod{p^3}$$ $$(ab)^{p^3} \equiv a^{p^3}+b^{p^3} \pmod{p^4}$$ $$...$$ $$(ab)^{p^l} \equiv a^{p^l}+b^{p^l} \pmod{p^{l+1}}$$

In case of Fermat's last theorem this implies:

$$x^{p} \equiv y^{p}+z^{p} \pmod{p^2}$$ $$x^{p^2} \equiv y^{p^2}+z^{p^2} \pmod{p^3}$$ $$x^{p^3} \equiv y^{p^3}+z^{p^3} \pmod{p^4}$$ $$...$$ $$x^{p^l} \equiv y^{p^l}+z^{p^l} \pmod{p^{l+1}}$$

which proves the theorem.

Does imaginary part of complex number represents the meaning of down payment or stealing in real life??

Posted: 20 Jun 2022 09:18 PM PDT

I am new to complex numbers and trying to understand them and find real life conditions which could be interpreted by them "like positive one apple is gained apple and negative one apple is lost apple....so what is +1i apple and -1iapple" ....

I know that electromagnetism ,ac, wave analysis and quantum mechanics are explained by complex numbers but i can not hold them in my hand like apples...
So i had first thought that +1i may represent the situation when two persons A,B are making bets "gambling" over the result of match
And from point of view of person A : person A put -1i dollar with intermediate person ,person B put +1i dollar with the intermediate person .....
Here for person A, I consider the dollar he gives for the intermediary person -1i because if he loses, he transfers this -1i dollar to real -1 dollar ....
and the dollar given by B to the intermediary is +1i relative to A because if A wins he converts this +1i into real +1 dollar...
I do not know if my understanding is right but i continue to study complex numbers and i solve the following problem :
p(x)=-0.3x^2 +50 x-170 where p is the profit and x is the lamps produced per week.
"note here that i do not know the price of lamp or the cost of manufacturing and that the profit is nonlinear"
then I try to find number of lamps which make profit of 3000 dollars.
the solution is complex number with real part 83.3333 and imaginary part +-60.1849i.
Now i try to get profit for real part alone "1913.3333"and imaginary part alone"916.6667" then add them together but the result is 2830 not 3000.how this is possible????
Note that 2830 +170 =3000????!!!

During this thinking i tried to calculate the imaginary part after conversion to real 60.1849 ,then i thought of this positive imaginary part as taking full price in advance "down payment of 1752.5783" for future production "to get beyond the maximum profit per week of 1913.3333" and I thought of the negative imaginary part as stealing this 60 lamps from the deal and reselling them so we also get beyond barrier of weekly maximum profit...
but calculations for both cases were not equal to 3000 "1913.3333 + 1752.5783 = 3665.9117" does my guessings about meanings of imaginary part as price in advance or stealing true???

I know that my question may be vague ... this is because i am confused about meaning of complex numbers and i will accept all edits to make it more clear...

What are the possible $2\times 2$ semisimple matrices?

Posted: 20 Jun 2022 09:09 PM PDT

I have to find all possible values of $a$, $b$, $c$, and $d$ for which the $2 \times 2$ matrix $\begin{pmatrix}a & b \\ c & d \end{pmatrix}$ is semisimple. I know that a matrix $S$ is semisimple if there is a nonsingular matrix $P$ such that $P^{-1}SP=A$ is diagonal. How can I use this to find all the values $a$, $b$, $c$ and $d$?

Graph Theory + Dynamical Systems

Posted: 20 Jun 2022 09:04 PM PDT

Suppose you had a dynamical system $\dot{\vec{x}} = \vec{f}(\vec{x})$. In theory, one could represent this as a directed graph where the vertices are fixed points of the dynamical system and the edges of the graph are the orbits between them.

I've read about book embeddings of graphs, but other than it being just a nice visual representation, does a book embedding tell you anything about the dynamical system?

Likewise, I've been reading about the adjacency matrix of a graph. It's a nice way to convert a graph into a matrix in which one knows a lot of algebra about, but does anything about this matrix give you insight about the dynamical system? (e.g. determinants, eigenvalues, etc.). I've seen papers that give bounds on the determinants of graphs, but it doesn't sound like that bound gives you any information about the dynamical system.

It seems interesting to be able to compare graphs and dynamical systems, but to me at least, it just seems like a graph is a nice visualization of dynamical system behavior, and I'm not sure what quantitative information you can extract from the graph. I'm not a graph theorist so I'm not familiar with this field, but I'd be quite interested in finding out quantitative relationships between a graph and dynamical systems.

Linear Algebra and planes in Cartesian space

Posted: 20 Jun 2022 08:02 PM PDT

I was asked this question from the course Linear Algebra and I need to show all working.

The question is in 5 parts:

Consider the xyz-space R3 with the origin O. Let l be the line given by the Cartesian equation $$x = \frac{z - 1}2, y = 1 $$ Let p be the plane given by the Cartesian equation $$2 x + y - z = 1$$

a) Find two unit vectors parallel to the line l.

b) Find the point Q which is the intersection of the plane p and z-axis.

c) Take n = 2 i + j - k as a normal vector of the plane p. Decompose the vector QO into the sum of two vectors: one of them is parallel to n and the other one is orthogonal to n.

d) The plane p divides R3 into two parts. Find the unit vector perpendicular to p and pointing into the part containing the origin O.

e) Let P(x, y, z) be a point on the line l. Letting x = t for some constant t, find the y and z coordinates of P. Calculate the distance from P to the plane p.

I would like to thank everyone who takes time in helping me with this problem and I really appreciate the help.

Thanks again.

No comments:

Post a Comment