quaternion_algorithm.txt

Observation 1: A quaternion encodes more than the position of the axis of the telescope. Since it encodes the position of a rigid body, it encodes the position of the optical axis (i.e. pitch, yaw) as well as the position of the focuser (i.e. roll). In other words it tells you how to move the telescope and also how to rotate it. We are interested in the pitch/yaw but not interested in the roll. We also only get the pitch/yaw from the plate solve. In principle we could calculate the roll, but it would depend on whether the scope was mounted equatorially, horizontally, or a dob on a equatorial platform, or... many other possibilites. So it is best to assume that we only know the pitch and yaw. So data in a plate solve < data in a quaternion.

Note 2: The unit quaternion q = cos(a/2) + v sin(a/2) is a mathematical object that represents (similar to a rotation matrix), a rotation through the angle a about the axis given by the unit vector v. It is a matter of notational convenience that we may write the vector v in terms of i, j, k basis vectors and the quaternion q in terms of i, j, k, basis quaternions, which leads to this equation -- in reality, vectors and quaternions are completely different entities. (For example, the vector i is the unit vector along the X axis, whereas the quaternion i represents a 180° rotation about the X-axis)

Note 3: Under the rotation defined by quaternion q, the vector v transforms into the vector v' = q v q¯¹ (not sure if it is q v q¯¹ or q¯¹ v q but we can assume that as a convention by redefining which way we rotate; I will stick to the q v q¯¹ format). Once again, the objects q and v are entirely different mathematical objects, but the computations work if we expand v and q in terms of the same i, j, k units.

Lemma 4 (Frame change for quaternions): A quaternion is expressed in a frame (basis i, j, k). Let us make a frame change from a different "IMU" frame (i', j', k') to our favorite "ground" frame (i, j, k). This is also a rotation, so it too, can be expressed by a quaternion. Let us call this quaternion p and express it in the (i, j, k) frame. Then, a quaternion q' expressed in the (i', j', k') "IMU" frame may be written in the (i, j, k) "ground" frame as: q = p q' p¯¹
Proof: Write q' in its axis-angle form as q' = cos(a/2) + sin(a/2) v' where v' is the vector around which q' rotates, expressed in the "IMU" frame. Then, we see that
p q' p¯¹ = p cos(a/2) p¯¹ + sin(a/2) p v' p¯¹
cos(a/2) is a scalar quantity, so the multiplication of p and it is commutative, so we get p p¯¹ in the first term, which is 1 (because rotations are unit quaternions). Then, we notice that p v' p¯¹ is simply the rotation of v' from the IMU frame to the "ground" frame (using Note 3). This combination p q' p¯¹ therefore represents the same rotation about the same vector, just expressed in the "ground" frame.

Lemma 5: The difference between two rotations can be computed by quaternion inverse (just like rotation matrices). If we want to undo the rotation given by unit quaternion s and do the rotation given by t, this is given by the unit quaternion t s¯¹. Justification: Use Note 3.

Okay, now for the problem:

Let us define the "ground" frame as having X-axis pointing to cardinal north point (not NCP, but along the horizon), and Z-axis pointing to the zenith. By convention, it is a right-handed frame, so the Y-axis points to the west. Let the "IMU" frame be the frame in which the IMU reports its orientation. For an accel+gyro fusion, this has the Z'-axis pointing to the zenith, and X'-axis arbitrary (usually, heading when it was turned on). For accel+gyro+mag fusion, this has the Z'-axis pointing to the zenith AND the X'-axis pointing to the north as estimated by magnetometer. For pure gyro, this is probably just the frame the corresponds to the starting orientation of the IMU when it was turned on. In the first case, it differs from the "ground" frame by one parameter (an unfixed azimuth offset A₀). In the second case, it differs mostly by some tiny error in azimuth (so it can pretty much be treated same as the first case). In the third case, it differs from the "ground" frame by an arbitrary rotation, i.e. an arbitrary unit quaternion (which is three angles worth of info). This is assuming the user does not deliberately paste the IMU on the scope so as to align its axis with the telescope's canonical axis and start it up when the telescope is in some "parked" position etc.

The scope moves from point 1 to point 2 in the sky. As a result, the optical axis moves from v₁ to v₂ (optic axis is modeled by a unit vector). The scope's full transformation (including rotation of the focuser, so to speak) is a rigid-body rotation which is described by a pair of quaternions which we have no way of determining (unless the IMU's initial frame is aligned to this frame) -- so we leave them out of the discussion. Plate solving gives us the unit vectors v₁ and v₂ (assuming we know latitude and LST so we can convert RA/Dec to Alt/Az and therefore determine the optic axis vector). The IMU gives us a reading at each of the two points (either Euler angles, or quaternion -- we shall prefer quaternions for this discussion as they are simpler to manipulate), but they are referenced to the "IMU" frame, which we don't know. Let us denote the two quaternions produced by the IMU as q₁' and q₂' at these points. They are in the "IMU" frame. The readings may be converted into the "ground" frame as q₁ = p q₁' p¯¹, and q₂ = p q₂' p¯¹ per Lemma 4. p is fully known (and is 1) only if we are using accel+gyro+mag fusion and the magnetometer is perfectly calibrated. If we are using accel+gyro or if the mag is uncalibrated, then we can write p in terms of the unknown azimuth offset as p = cos(A₀/2) + k sin(A₀/2). If we are in pure gyro config (or we want to account for local gravity fluctuations / e-terms), p is an arbitrary unit quaternion which we do not know. Thus, the data we have is:

Data point 1: optic axis v₁ corresponds to IMU orientation p q₁' p¯¹
Data point 2: optic axis v₂ corresponds to IMU orientation p q₂' p¯¹
Unknowns: p
Now the task at hand is to predict v(t) given q'(t). For this we must determine p.

However, we don't know the full rotation (i.e. we are missing the roll) of the telescope (unless we know how it is mounted). So the best we can do is to say that the difference quaternion maps the direction v₁ to the direction v₂, i.e. (p q₂' p¯¹)(p q₁' p¯¹)¯¹ v₁ (p q₁' p¯¹)(p q₂' p¯¹)¯¹ = v₂
We can slightly simplify this equation using (AB)¯¹ = B¯¹ A¯¹, to get p q₂'q₁'¯¹p¯¹v₁pq₁'q₂'¯¹p¯¹ = v₂, which is an equation for the three unknowns in p. To simplify our life a bit, let us define y := q₂'q₁'¯¹. Then, (p y p¯¹) v₁ (p y p¯¹)¯¹ = v₂.

Now this is a set of 3 equations (vectors have i, j, k components) for the 3 unknowns in p. So it looks like they might be solved, but let's try.

First, let's define X = (p y p¯¹) for convenience. Then we can write the equation as the following two "step-wise" equations:
Equation 1: X v₁ X¯¹ = v₂ (Step 1: Given v₁ and v₂, solve for X)
Equation 2: p y p¯¹ = X (Step 2: Given y and X, solve for p)

Now, the second equation has a unique solution, because y and X are quaternions. This is a special case of https://en.wikipedia.org/wiki/Sylvester_equation but actually we don't need all the complicated Sylvester algorithms because it's just 4 components. We can actually just substitute expressions for the quaternions in terms of their components, put in the constraint that they are unit quaternions, and solve it by brute force to get algebraic equations. It's a pain from an algebraic standpoint, but it can be done.

But the first equation is more complicated. It looks identical to the second one, but it does not have a unique solution, because v₁ and v₂ are vectors (and not quaternions!). Let me explain why intuitively. The first equation is effectively saying "find me the rotation (matrix / quaternion) that converts vector v₁ into vector v₂". We know that this is not unique because we have not specified the roll. Let's go back to our physical example to make this intuitive. The optic axis of the telescope can move from direction v₁ to direction v₂ in a myriad of ways -- it can do so without changing the angle the focuser makes with the horizontal plane like a dob does, or it can do so like a German equatorial telescope, or it could do so in an alt-alt fashion -- so the exact rigid body rotation of the scope depends on how the focuser rotates, not just the motion of the optic axis. In other words, it depends on the details of how the telescope is mounted. This problem does not plague Equation 2, because X and y are quaternions, so the full rotation data is present.

Now, if we had some constraint on what kind of rotation X could be, we could solve that. This depends on if we have some sort of constraint on what kind of rotation p is, because Equation 2 will translate that into a constraint imposed on X. But as far as we are concerned for now, p is arbitrary, so in principle by choosing different quaternions for p I could generate arbitrary quaternions for X (through equation 2), so there is no constraint on X that will help solve Equation 1.

So there are many solutions to this problem:
(1) Make an assumption of how the telescope tube actually rotates (including roll), e.g. equatorial mounting / altaz mounting etc, thereby specifying more than a vector equation for (1)
(2) Make an assumption on the nature of the "IMU" frame defined through p, thereby constraining X.
(3) See if we can solve the system without a third data point (get another equation)

I haven't tried (1) or (3). The reason is that (1) is especially complicated for my case, which is an alt/az scope on an equatorial platform. I haven't tried (3) because I wanted to keep the alignment process simple. But I have applied in the above, a special case of (2). The assumption I have made is that p is a rotation in the horizontal X-Y plane, i.e. around the Z-axis, i.e. p = cos(A₀/2) + k sin(A₀/2) -- which corresponds to assuming that the accelerometer is good, but the magnetometer is not.

If we make this assumption, p is no longer an arbitrary quaternion, which means that X is no longer an arbitrary quaternion. So we are now solving a well-spec'd system. So the way I solved it to derive the "estimation" algorithm, was to write y in terms of its components (y₀, y₁, y₂, y₃), then multiplying it out by the assumed p to find an expression for X in terms of the one unknown A₀ and the known components of y. This turns out to simplify to:

X = (y₀, y₁ cos A₀ + y₂ sin A₀, y₂ cos A₀ - y₁ sin A₀, y₃)

Then we can plug this X into equation (1), write the vectors v₂ and v₁ in terms of altitude and azimuth, and brute-force expand all the terms, equate and simplify, which I did over several pages of work, to obtain the estimation equations I listed to obtainA₀.

I think the other routes are also possible, i.e. make assumptions about the mount, or try to solve using more data points; but since the Z-axis was already quite reliable, I decided to use that.