Try an interactive version of this dialog: Sign up at solve.it.com, click Upload, and pass this URL.

Alberto Pepe Dissertation - Full

Building the Dialog

I want to make a solveit dialogue for Alberto Pepe's dissertation, which I originally had as PDF.

Then after creating a free account on DataLab.io, I used that to generate Markdown and HTML versions of it, and uploaded them to solveit as pepe-diss.md and pepe-diss.html. The HTML version includes figures!

Hi, can you help me build a solveit dialogue so I can do a deep reading of this dissertation? I'd like any section headings or headers to signal subdivision into new separate blocks. I'd like the figures to be included, along with all charts and equations.

Data Lab already did a good job of converting the math. So there's one set of it in the Markdown, another set in the HTML. I suspect since you're mostly going to be building Markdown,... You could just import using Markdown for the most part and then only turn to the HTML to find the figures to fill in.

That makes sense? Do you think he could get started? So you're going to create many, many, many new note cells. And I suggest that you always append to the end of the dialogue so that they don't end up in some weird other order.

May take so long to do that you might hit time out cancellation of your operations in which the task will be only partially completed

So you'll need some way to resume from your previous progress based on which notes have already been added to this dialogue.-- We don't need to re-add the same note twice, And we can skip over any notes that have already been added and get to the main good stuff.

I suggest you first output a description of your strategy for how you're going to tackle this. It is a long dissertation. With many figures and tons of mathematics. PDF version is 228 pages. So, write out your strategy first and then pause, and then I will read your strategy and tell you when to execute it.

As I say, Data Lab has already done the heavy lifting for you. You don't need to do any OCR. You can go directly to the Markdown and or HTML. Again, I think favoring the markdown for most things and only using the HTML for the figures may be a good plan, but it's up to you.

Let me first get a sense of the files we're working with.

view(path='.')→'Directory contents of /app/data/papers:\n…'
{
  "id": "toolu_01AkSLBSUqrY7ADxgptpdihP",
  "call": {
    "function": "view",
    "arguments": {
      "path": "."
    }
  },
  "result": "Directory contents of /app/data/papers:\n/app/data/papers/pepe-diss.html (5565.7k)\n/app/data/papers/pepe-diss.md (586.0k)\n/app/data/papers/midi-rae \ud83d\udcc1\n/app/data/papers/pepe-diss.ipynb (2.8k)\n/app/data/papers/midi-rae/probe_musicality.py (39.8k)\n/app/data/papers/midi-rae/probe_summary.png (38.8k)\n/app/data/papers/midi-rae/emb_pca_scatters.png (1261.5k)\n/app/data/papers/midi-rae/probe_time_translation.png (82.6k)\n/app/data/papers/midi-rae/probe_equivariance.zip (536.1k)\n/app/data/papers/midi-rae/paper_talk2.ipynb (2508.5k)\n/app/data/papers/midi-rae/mldropout.png (33.1k)\n/app/data/papers/midi-rae/probe_conditioning.py (22.8k)\n/app/data/papers/midi-rae/probe_equivariance \ud83d\udcc1\n/app/data/papers/midi-rae/probe_transposition.png (78.2k)\n/app/data/papers/midi-rae/midi_rae_ismir2026.pdf (1823.0k)\n/app/data/papers/midi-rae/paper_talk.ipynb (1423.2k)\n/app/data/papers/midi-rae/cond_proj_norm.png (107.5k)\n/app/data/papers/midi-rae/midi_rae_refs.bib (5.3k)\n/app/data/papers/midi-rae/midi_rae_ismir2026.tex (35.6k)\n/app/data/papers/midi-rae/real_matched_etc.png (46.4k)\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact \ud83d\udcc1\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact \ud83d\udcc1\n/app/data/papers/midi-rae/probe_equivariance/dino \ud83d\udcc1\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact/probe_summary.png (38.6k)\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact/probe_time_translation.png (86.0k)\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact/probe_time_translation.csv (6.9k)\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact/probe_transposition.csv (2.9k)\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact/stormbird_results.log (2.9k)\n/app/data/papers/midi-rae/probe_equivariance/probe48_nofact/probe_transposition.png (79.7k)\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact/probe_summary.png (38.4k)\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact/probe_time_translation.png (80.0k)\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact/probe_time_translation.csv (6.9k)\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact/probe_transposition.csv (2.9k)\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact/stormbird_results.log (2.9k)\n/app/data/papers/midi-rae/probe_equivariance/probe_48fact/probe_transposition.png (85.8k)\n/app/data/papers/midi-rae/probe_equivariance/dino/probe_summary.png (35.9k)\n/app/data/papers/midi-rae/probe_equivariance/dino/probe_time_translation.png (57.9k)\n/app/data/papers/midi-rae/probe_equivariance/dino/probe_time_translation.csv (2.3k)\n/app/data/papers/midi-rae/probe_equivariance/dino/probe_transposition.csv (0.9k)\n/app/data/papers/midi-rae/probe_equivariance/dino/stormbird_results.log (2.0k)\n/app/data/papers/midi-rae/probe_equivariance/dino/probe_transposition.png (60.3k)"
}

Good — the markdown is ~586KB and the HTML is ~5.5MB (likely because of embedded figures). Let me look at the structure of the markdown to understand the heading hierarchy.

pyrun(code="import re\nmd = Path('/app/data/papers/pe…")→'----------------------------------------…'
{
  "id": "toolu_017o2pocTMGiiGHZmsg1H75C",
  "call": {
    "function": "pyrun",
    "arguments": {
      "code": "import re\nmd = Path('/app/data/papers/pepe-diss.md').read_text()\n# Extract all headings\nheadings = [(m.start(), m.group()) for m in re.finditer(r'^(#{1,6})\\s+(.+)$', md, re.MULTILINE)]\n# Show first 80 headings with their line numbers\nlines = md.split('\\n')\nheading_lines = []\nfor i, line in enumerate(lines):\n    m = re.match(r'^(#{1,6})\\s+(.+)$', line)\n    if m:\n        heading_lines.append((i+1, len(m.group(1)), m.group(2).strip()))\n\nf\"Total headings: {len(heading_lines)}, Total lines: {len(lines)}\""
    }
  },
  "result": "---------------------------------------------------------------------------\nNameError                                 Traceback (most recent call last)\nCell In[7], line 1\n----> 1 await maybe_await(pyrun(code='\\nimport re\\nmd = Path(\\'/app/data/papers/pepe-diss.md\\').read_text()\\n# Extract all headings\\nheadings = [(m.start(), m.group()) for m in re.finditer(r\\'^(#{1,6})\\\\s+(.+)$\\', md, re.MULTILINE)]\\n# Show first 80 headings with their line numbers\\nlines = md.split(\\'\\\\n\\')\\nheading_lines = []\\nfor i, line in enumerate(lines):\\n    m = re.match(r\\'^(#{1,6})\\\\s+(.+)$\\', line)\\n    if m:\\n        heading_lines.append((i+1, len(m.group(1)), m.group(2).strip()))\\n\\nf\"Total headings: {len(heading_lines)}, Total lines: {len(lines)}\"\\n'))\n\nFile /usr/local/lib/python3.12/site-packages/fastcore/xtras.py:1030, in maybe_await(o)\n   1028 \"Await `o` if needed, and return it\"\n   1029 from inspect import isawaitable\n-> 1030 return await o if isawaitable(o) else o\n\nFile /usr/local/lib/python3.12/site-packages/safepyrun/core.py:339, in RunPython.__call__(self, code)\n    337 tb = e.__traceback__\n    338 while tb.tb_next and not tb.tb_frame.f_code.co_filename.startswith('<pyrun'): tb = tb.tb_next\n--> 339 raise e.with_traceback(tb) from None\n\nFile <pyrun_0>:2\n      1 import re\n----> 2 md = Path('/app/data/papers/pepe-diss.md').read_text()\n      3 headings = [(m.start(), m.group()) for m in re.finditer('^(#{1,6})\\\\s+(.+)$', md, re.MULTILINE)]\n      4 lines = md.split('\\n')\n\nNameError: name 'Path' is not defined"
}

Need to import Path. Let me fix that.

pyrun(code='from pathlib import Path\nimport re\nmd = …')→'----------------------------------------…'
{
  "id": "toolu_01Q8yPXr2oTosFntQ4opsfY6",
  "call": {
    "function": "pyrun",
    "arguments": {
      "code": "from pathlib import Path\nimport re\nmd = Path('/app/data/papers/pepe-diss.md').read_text()\nlines = md.split('\\n')\nheading_lines = []\nfor i, line in enumerate(lines):\n    m = re.match(r'^(#{1,6})\\s+(.+)$', line)\n    if m:\n        heading_lines.append((i+1, len(m.group(1)), m.group(2).strip()))\n\nf\"Total headings: {len(heading_lines)}, Total lines: {len(lines)}\""
    }
  },
  "result": "---------------------------------------------------------------------------\nPermissionError                           Traceback (most recent call last)\nCell In[8], line 1\n----> 1 await maybe_await(pyrun(code='\\nfrom pathlib import Path\\nimport re\\nmd = Path(\\'/app/data/papers/pepe-diss.md\\').read_text()\\nlines = md.split(\\'\\\\n\\')\\nheading_lines = []\\nfor i, line in enumerate(lines):\\n    m = re.match(r\\'^(#{1,6})\\\\s+(.+)$\\', line)\\n    if m:\\n        heading_lines.append((i+1, len(m.group(1)), m.group(2).strip()))\\n\\nf\"Total headings: {len(heading_lines)}, Total lines: {len(lines)}\"\\n'))\n\nFile /usr/local/lib/python3.12/site-packages/fastcore/xtras.py:1030, in maybe_await(o)\n   1028 \"Await `o` if needed, and return it\"\n   1029 from inspect import isawaitable\n-> 1030 return await o if isawaitable(o) else o\n\nFile /usr/local/lib/python3.12/site-packages/safepyrun/core.py:339, in RunPython.__call__(self, code)\n    337 tb = e.__traceback__\n    338 while tb.tb_next and not tb.tb_frame.f_code.co_filename.startswith('<pyrun'): tb = tb.tb_next\n--> 339 raise e.with_traceback(tb) from None\n\nFile <pyrun_1>:9\n      7 m = re.match('^(#{1,6})\\\\s+(.+)$', line)\n      8 if m:\n----> 9     heading_lines.append((i + 1, len(m.group(1)), m.group(2).strip()))\n\nFile /usr/local/lib/python3.12/site-packages/safepyrun/core.py:161, in _ReadOnlyCallable.__call__(self, *args, **kwargs)\n    159 n = object.__getattribute__(self, '_name')\n    160 if n: raise PermissionError(f\"Calling `{n}` is not permitted; use `allow()` to add it\") from None\n--> 161 raise PermissionError(f\"Calling {type(object.__getattribute__(self, '_obj')).__name__} not allowed in sandbox\")\n\nPermissionError: Calling builtin_function_or_method not allowed in sandbox"
}

The sandbox doesn't allow .group(). Let me use rg instead to extract headings from the markdown.

rg(argstr='-n ^#{1,3}\s /app/data/papers/pepe-diss.…')→'3:# Machine Learning with Geometric Alge…'
{
  "id": "toolu_011orbhN2AyP45s6pWg9YXzH",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-n ^#{1,3}\\s /app/data/papers/pepe-diss.md"
    }
  },
  "result": "3:# Machine Learning with Geometric AlgebraMultivectors for Modelling, Understanding and Computing\n31:# Declaration\n40:# Acknowledgements\n76:# Abstract\n86:# Table of contents\n226:# List of figures\n400:# List of tables\n452:# Nomenclature\n454:## Number Sets\n464:## Mathematical Symbols\n563:# Chapter 1\n565:## Introduction and Background\n576:### 1.1 Introduction\n610:## 1.2 Fundamentals of GA\n614:### 1.2.1 Defining a subalgebra\n618:### 1.2.2 The geometric product\n654:### 1.2.3 Geometric Algebra of the plane and of the space\n672:## 1.2.4 Geometric transformations\n676:### Reflections\n694:## Rotations\n740:### 1.2.5 Conformal Geometric Algebra (CGA)\n764:### 1.2.6 Transformations in CGA\n794:### Translations\n818:### Dilations\n841:### 1.2.7 Objects and incidence\n851:### Planes\n857:### Circles\n873:### Spheres\n896:### Intersections\n924:## 1.2.8 Other algebras\n926:### 1d-Up CGA\n932:### Spacetime Algebra (STA)\n960:### 1.3 Learning with GA\n989:### 1.3.1 GA: the 6<sup>th</sup> \u201cG\u201d of Geometric Deep Learning?\n1014:### 1.3.2 Learning with GA in practice\n1034:# Chapter 2\n1036:## Rotations as Rotors\n1044:### 2.1 Introduction\n1052:## 2.2 Learning rotations\n1054:### 2.2.1 Background\n1062:### 2.2.2 The 6D representation\n1077:### 2.2.3 Learning mappings between representation spaces\n1121:### GA-based Sanity Check\n1138:## 2.2.4 Pose estimation of 3D point clouds\n1203:## Noisy Point Clouds\n1211:### 2.2.5 Inverse kinematics\n1275:## 2.3 Molecular geometry optimisation through rotors\n1277:### 2.3.1 Background\n1293:### 2.3.2 Molecules as rotors\n1323:### 2.3.3 The evolutionary algorithm\n1382:### 2.3.4 Examples\n1426:### The Gly - Phe dipeptide\n1492:## 2.4 Conclusions\n1502:# Chapter 3\n1504:## Protein Structure Prediction (PSP)\n1515:### 3.1 Introduction\n1523:### 3.2 Protein structures as 3D CGA planes\n1547:### 3.2.1 Geometrical interpretation of the cost function\n1569:### 3.3 Cost maps: a geometric description of amino acids\n1591:### 3.3.1 Cost maps and secondary structures\n1629:### 3.3.2 Cost maps and different protein conformations\n1665:### 3.3.3 Cost maps and NMR ensembles\n1694:### 3.4 Alternative representations\n1698:### 3.4.1 Dihedral angles\n1730:### 3.4.2 Dot product between oriented points\n1772:### 3.5 Dataset: *PDNET*\n1800:## 3.6 Estimating orientational features\n1824:### 3.6.1 Training details\n1840:## 3.7 PSP with GA features\n1842:### 3.7.1 Architecture: Graph Transformer + 3D projector\n1852:## The Graph Transformer\n1873:## The 3D Projector\n1881:### 3.7.2 Training details\n1905:### 3.7.3 Alignment of predicted coordinates to ground truth\n1911:### 3.7.4 Results\n2044:## Geometrical meaning of features\n2143:## 3.8 PSP with equivariant layers in 3D GA\n2147:### 3.8.1 GA-equivariant 3D projector\n2197:### 3.8.2 Triple-wise fully connected geometric product layer\n2215:### 3.8.3 Results\n2286:## 3.9 Conclusions\n2317:# Chapter 4\n2319:## 3D Pose Estimation\n2328:### 4.1 Introduction\n2338:## 4.2 Camera Pose Estimation via CGAPoseNet+GCAN\n2340:### 4.2.1 Background\n2352:## 4.2.2 Problem definition\n2396:## 4.2.3 Method\n2398:### 4.2.4 Poses as motors in 1d-Up CGA\n2460:## 4.2.6 Experiments\n2462:### 4.2.7 Datasets: *Cambridge Landmarks and 7 Scenes*\n2524:## 4.2.10 Results\n2658:### 4.3 3D Line Alignment with *Define, Refine, Align*\n2680:### 4.3.1 Related work on partial-to-partial registration\n2700:### 4.3.2 Method\n2714:### 4.3.3 3D Lines as bivectors in 1d-Up CGA\n2762:### 4.3.4 Architecture: attentional, rotational and equivariant modules\n2856:### 4.3.5 Datasets: *Structured3D* and *Semantic3D*\n2868:### 4.3.6 Error metrics\n2925:### 4.3.7 Implementation details\n2937:### 4.3.8 Results\n2959:## 4.4 Conclusion\n2971:# Chapter 5\n2973:## Partial Differential Equations (PDEs)\n2985:### 5.1 Introduction\n2997:## 5.2 Background\n3015:## Part I: Computational Fluid Dynamics\n3017:## 5.3 GA-ReLU: an activation function for Navier-Stokes PDEs\n3021:### 5.3.1 Problem definition\n3127:### 5.3.2 Data generation\n3131:### 5.3.3 Training details\n3135:### 5.3.4 Experiments\n3141:### 5.3.5 Results\n3159:## 5.4 Fengbo: a neural operator for 3D PDEs over irregular geometries\n3175:### 5.4.1 Method\n3189:### 5.4.2 Architecture\n3285:### 5.4.3 Experiments\n3311:## Results\n3459:### 5.4.5 Training details\n3473:### 5.4.6 Ablation study\n3681:### 5.4.7 Notes on computational complexity\n3734:## Part II: Electromagnetism\n3736:## 5.5 STAResNet: working in STA to solve Maxwell's PDEs\n3742:### 5.5.1 Problem definition\n3764:### Maxwell's equations in $\\mathcal{G}_{3,0,0}$\n3782:### Maxwell's equations in $\\mathcal{G}_{1,3,0}$\n3867:### 5.5.2 Architecture: a ResNet in STA\n3879:### Training details\n3928:### 5.5.3 Experiments in 2D\n3965:### Impact of obstacles\n4007:### Impact of number of parameters\n4039:## Impact of rollout\n4051:### 5.5.4 Experiments in 3D\n4185:## 5.6 Conclusions\n4199:# Chapter 6\n4201:## Conclusions\n4211:### 6.1 Conclusions\n4229:## 6.2 Contributions\n4243:## 6.3 Future Work...\n4251:## 6.4 ...and Beyond\n4261:# References"
}

Excellent — this gives me a very clear picture. The dissertation has a rich heading structure with ~150+ headings across 6 chapters plus front matter. Let me also check how figures appear in the markdown vs HTML, and the total line count.

rg(argstr='-c -n \!\[.*\]\( /app/data/papers/pepe-d…')→'regex parse error:\n \!\[.*\]\(\n ^^…'
{
  "id": "toolu_01BU8a6bqrS4V6aB8EZMq1Rq",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-c -n \\!\\[.*\\]\\( /app/data/papers/pepe-diss.md"
    }
  },
  "result": "regex parse error:\n    \\!\\[.*\\]\\(\n    ^^\nerror: unrecognized escape sequence"
}
rg(argstr='-c "<img" /app/data/papers/pepe-diss.htm…')→123
{
  "id": "toolu_01B3ymEJsg19hfGhLqsrHYka",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-c \"<img\" /app/data/papers/pepe-diss.html"
    }
  },
  "result": "123"
}
sed(argstr='-n $= /app/data/papers/pepe-diss.md')→4479
{
  "id": "toolu_01DeeXBuYxZppsgUrWLbw22i",
  "call": {
    "function": "sed",
    "arguments": {
      "argstr": "-n $= /app/data/papers/pepe-diss.md"
    }
  },
  "result": "4479"
}

OK so: the markdown is 4479 lines, the HTML has 123 <img> tags. Let me check how images appear in the markdown.

rg(argstr='-n "!\[" /app/data/papers/pepe-diss.md')→'5:![University of Cambridge crest](935ee…'
{
  "id": "toolu_01LqHu8PCynUUMBNuWfe3F8L",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-n \"!\\[\" /app/data/papers/pepe-diss.md"
    }
  },
  "result": "<TRUNCATED>\u2026sity of Cambridge crest](935eed7aa61f7777f62cfc032e11bee9_img.jpg)\n658:![Figure 1.1: Elements of the Geometric Algebra (GA) of Euclidean space. The diagram shows the progression from a scalar (a dot) to three vectors (e1, e2, e3) to three bivectors (e12, e23, e13) and finally to a trivector (e123).](02f9c911b69504d90bd20e0bc61c4bbb_img.jpg)\n682:![A 3D diagram illustrating reflection in Geometric Algebra. A magenta plane labeled \u03a0 is shown. A cyan plane labeled \u03a0' is its reflection. A unit normal vector n is shown as a black arrow pointing from the magenta plane towards the cyan plane. The reflection is performed across a plane with unit normal vector n.](9612f08d343bcc2a11b174db64ba7b3a_img.jpg)\n700:![Diagram illustrating rotation in Geometric Algebra. A multivector \u03a0 (red plane) is rotated to \u03a0'' (green plane) by applying two successive reflections across planes with unit normals n and m. The rotation is given by \u03a0'' = mn\u03a0nm = R\u03a0R\u0303, where R = mn is the rotor. The diagram shows the planes \u03a0, \u03a0', and \u03a0'' in a 3D coordinate system, with the rotor R indicated by a curved arrow.](419237decd8294588ca7ca2aa84b0e75_img.jpg)\n724:![A 3D visualization showing a 90-degree rotation of geometric objects. A blue vertical plane and a blue circle are shown, along with an orange horizontal plane and an orange circle. A blue vector points upwards, and an orange vector points to the left. Two points, one blue and one orange, are also shown. The rotation is indicated by a curved arrow around the vertical axis.](f3ce2d7158eb708e3487b8e35415db35_img.jpg)\n786:![A 2D coordinate system showing a blue vector 'a = e1 + 2e2' starting at the origin (0,0) and pointing to the point (1,2). A red vector, labeled 'F(a_inv) = a/a^2 = eF(a)e', starts at the origin and points to the point (0.2, 0.4), which is the inversion of vector 'a'.](1c2028183a35357e7238438a4af9cab7_img.jpg)\n812:![A 3D diagram illustrating a translation in Conformal Geometric Algebra. A pink plane labeled \u03a0 is shown at the bottom, and a green plane labeled \u03a0' is shown above it. An arrow labeled T_a points from the pink plane to the green plane, representing the translation vector. The planes are shown within a 3D coordinate system with gray axes.](db267ff9c1b97bbae0cb0856be1d8734_img.jpg)\n835:![A 3D visualization showing a dilation in Conformal Geometric Algebra. A small yellow sphere labeled \u03a3 is centered on a light gray plane. A larger teal sphere labeled \u03a3' is also centered on the same plane, representing the result of dilating \u03a3 by a scale factor \u03b1 = 6. The dilation is performed via the sandwich product \u03a3' = D_\u03b1 \u03a3 D\u0303_\u03b1, with D_\u03b1 = exp(\u03b1/2 e \u0113).](fef13e705ab28b357c22ed6444dde1a2_img.jpg)\n865:![Figure 1.8: A 3D diagram showing a line L and a circle C on a plane. The line L is labeled L = X_1 \\wedge X_2 \\wedge n_\\infty and the circle C is labeled C = X_1 \\wedge X_2 \\wedge X_3.](640118899584f24b3d6373431a6c7722_img.jpg)\n877:![Figure 1.9: A 3D diagram showing a sphere Sigma and a plane Pi. The sphere Sigma is labeled Sigma = X_1 \\wedge X_2 \\wedge X_3 \\wedge X_4 and the plane Pi is labeled Pi = X_1 \\wedge X_2 \\wedge X_3 \\wedge n_\\infty.](967e08f00a4fffdc167c53c3bad53c84_img.jpg)\n904:![Figure 1.10: Three diagrams illustrating the meet operator between two grade-4 objects. The first diagram shows two spheres, \u03a31 and \u03a32, intersecting to form a circle C1. The second diagram shows a sphere \u03a31 and a plane \u03a0 intersecting to form a circle C2. The third diagram shows two planes, \u03a01 and \u03a02, intersecting to form a line L.](fb15f15a7d964c8490c9317e43e40c1a_img.jpg)\n912:![Figure 1.11: Three diagrams illustrating the meet operator between grade-4 objects (sphere or plane) and a grade-3 object (line or circle).](be0e6826ff90caab1b4da9ec86a23e96_img.jpg)\n936:![Figure 1.12: Two diagrams illustrating the effect of the parameter lambda on geometry in 1d-Up CGA. The left diagram shows a 3D coordinate system with a plane and a line. As lambda increases, the line transforms into a circle and the plane into a sphere. The right diagram shows a 3D coordinate system with a plane and a sphere. As lambda increases, the sphere transforms into a plane and the plane into a sphere.](c436e079f79bca972b79ed4b3e4613ea_img.jpg)\n977:![Diagram illustrating the relationship between Machine Learning with Geometric Algebra and Geometric Deep Learning. The diagram is split into two main sections: 'Machine Learning with Geometric Algebra' (left, yellow box) and 'Geometric Deep Learning' (right, blue box). The 'Machine Learning' section is further divided into 'Modelling' (Chapters 2, 3) and 'Computing' (Chapters 3, 4, 5). 'Modelling' includes molecular geometry optimisation, learning rotation, and cost maps, leading to CGAPoseNet. 'Computing' includes STAResNet, Fengbo, GA-ReLU, CGAPoseNet+GCAN, Define, Refine, Align, and CGENN 3D Projector.](187d05bf7ead21e1394b61320d8b3632_img.jpg)\n1146:![Diagram of a neural network architecture for pose estimation from 3D point clouds. The input consists of co\u2026</TRUNCATED>"
}

The markdown has ~100+ images with descriptive alt-text, referencing .jpg filenames. Let me check how images are embedded in the HTML (base64 vs external).

rg(argstr='-c "data:image" /app/data/papers/pepe-di…')→123
{
  "id": "toolu_019cNf4VNUPvFAtc9y7hQuru",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-c \"data:image\" /app/data/papers/pepe-diss.html"
    }
  },
  "result": "123"
}
rg(argstr='-o -m 1 "src=\"[^\"]{1,80}" /app/data/pa…')→'No output'
{
  "id": "toolu_01KgLfBDZMgdj2RhPedS19w2",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-o -m 1 \"src=\\\"[^\\\"]{1,80}\" /app/data/papers/pepe-diss.html"
    }
  },
  "result": "No output"
}

All 123 images are base64-encoded data:image URIs embedded in the HTML. The markdown references filenames like hash_img.jpg which don't exist as separate files — the images only live inside the HTML.

Let me check the HTML structure briefly to understand how to extract the base64 images and match them to the markdown references.

rg(argstr='-o -m 3 "src=\"data:image/[^;]+;base64,[…')→'Error: No closing quotation'
{
  "id": "toolu_01N8UKsrgfi2UVTiDHok8hrW",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-o -m 3 \"src=\\\"data:image/[^;]+;base64,[A-Za-z0-9+/]{0,40}\" /app/data/papers/pepe-diss.html"
    }
  },
  "result": "Error: No closing quotation"
}
rg(argstr='-o -m 3 "<img[^>]*?alt=\"[^\"]{0,80}" /a…')→'No output'
{
  "id": "toolu_011HtfhreTCupmJS1rcSv2Gc",
  "call": {
    "function": "rg",
    "arguments": {
      "argstr": "-o -m 3 \"<img[^>]*?alt=\\\"[^\\\"]{0,80}\" /app/data/papers/pepe-diss.html"
    }
  },
  "result": "No output"
}

Let me look at a small section of the HTML to understand the img tag structure.

sed(argstr='-n 1,30p /app/data/papers/pepe-diss.html')→'<h1>\n Machine Learning\n <br>\n with…'
{
  "id": "toolu_01TYWDmfW2tVNuSBZm7KZK5y",
  "call": {
    "function": "sed",
    "arguments": {
      "argstr": "-n 1,30p /app/data/papers/pepe-diss.html"
    }
  },
  "result": "<TRUNCATED>\u2026chine Learning\n   <br>\n   with Geometric AlgebraMultivectors for Modelling,\n   <br>\n   Understanding and Computing\n  </h1>\n  <img alt=\"University of Cambridge crest\" src=\"\u2026</TRUNCATED>"
}

Now I have a thorough understanding of the source material. Here's my strategy:


Strategy for Building the Dissertation Deep-Reading Dialog

Phase 1: Extract and save figure images

  • The HTML contains 123 base64-embedded images. The markdown references them by filename (e.g. 935eed7aa61f7777f62cfc032e11bee9_img.jpg), but these files don't exist on disk.
  • I'll parse the HTML, extract each <img> tag's base64 data, and save them as .jpg files in the current directory so that the markdown ![alt](filename.jpg) references will render correctly in solveit notes.

Phase 2: Split markdown into sections at heading boundaries

  • The markdown has ~150 headings across ~4,479 lines. I'll split it at every heading (# through ######), creating one note message per section.
  • Each note will start with the heading and contain all content up to (but not including) the next heading.
  • Front matter sections (Declaration, Acknowledgements, Abstract, TOC, List of Figures, List of Tables, Nomenclature) will each be separate notes.
  • Chapters 1–6 + References will each have their subsections broken out individually.

Phase 3: Create notes in the dialog

  • All notes will be appended to the end of the dialog using add_msg with placement='at_end'.
  • Equations (already converted by Data Lab as LaTeX $...$ and $$...$$) should render natively in solveit.
  • Figures will render via the saved image files.

Resumability:

  • Before adding notes, I'll check which headings already exist in the dialog (via find_msgs).
  • I'll track the last heading added, and on resume, skip all sections up to and including that heading, then continue from the next one.

Estimated output: ~150 note messages, each a digestible section of the dissertation.


Shall I proceed?

Great, given the kinds of timeouts that you're subject to, we might want to proceed by having you do each phase one at a time. Do you want to try to parse the HTML yourself, or do you want to write some Python code to parse it and save the images, and then I'll execute that Python? I think I get longer timeout limits than you do when executing code.

Definitely better for you to run the code — the HTML is 5.5MB with 123 base64 images, so that's a lot of parsing. Here's a script for Phase 1: Extract and save images from the HTML:

from pathlib import Path
from html.parser import HTMLParser
import base64, re

html = Path('pepe-diss.html').read_text()

# Also read the markdown to find which filenames are referenced
md = Path('pepe-diss.md').read_text()
md_imgs = re.findall(r'!\[.*?\]\(([^)]+)\)', md)
print(f"Markdown references {len(md_imgs)} images")

# Extract all <img> tags with base64 src
# Pattern: <img ... src="data:image/xxx;base64,DATA" ...>
img_pattern = re.compile(
    r'<img\s[^>]*?src="data:image/([^;]+);base64,([^"]+)"[^>]*?>',
    re.DOTALL
)

matches = img_pattern.findall(html)
print(f"Found {len(matches)} base64 images in HTML")

# The images appear in order in both HTML and markdown.
# Save each base64 image using the corresponding markdown filename.
saved = 0
for i, (fmt, b64data) in enumerate(matches):
    if i < len(md_imgs):
        fname = md_imgs[i]
        data = base64.b64decode(b64data)
        Path(fname).write_bytes(data)
        saved += 1

print(f"Saved {saved} images to disk")

# Check for any markdown image refs that didn't get a file
missing = [f for f in md_imgs if not Path(f).exists()]
print(f"Missing images: {len(missing)}")
if missing:
    for m in missing[:10]:
        print(f"  {m}")

This pairs up images by order of appearance (HTML and markdown should match since they came from the same PDF). Run this and let me know the output — we'll verify everything lines up before moving to Phase 2.

from pathlib import Path
from html.parser import HTMLParser
import base64, re

html = Path('pepe-diss.html').read_text()

# Also read the markdown to find which filenames are referenced
md = Path('pepe-diss.md').read_text()
md_imgs = re.findall(r'!\[.*?\]\(([^)]+)\)', md)
print(f"Markdown references {len(md_imgs)} images")

# Extract all <img> tags with base64 src
# Pattern: <img ... src="data:image/xxx;base64,DATA" ...>
img_pattern = re.compile(
    r'<img\s[^>]*?src="data:image/([^;]+);base64,([^"]+)"[^>]*?>',
    re.DOTALL
)

matches = img_pattern.findall(html)
print(f"Found {len(matches)} base64 images in HTML")

# The images appear in order in both HTML and markdown.
# Save each base64 image using the corresponding markdown filename.
saved = 0
for i, (fmt, b64data) in enumerate(matches):
    if i < len(md_imgs):
        fname = md_imgs[i]
        data = base64.b64decode(b64data)
        Path(fname).write_bytes(data)
        saved += 1

print(f"Saved {saved} images to disk")

# Check for any markdown image refs that didn't get a file
missing = [f for f in md_imgs if not Path(f).exists()]
print(f"Missing images: {len(missing)}")
if missing:
    for m in missing[:10]:
        print(f"  {m}")
Markdown references 123 images
Found 123 base64 images in HTML
Saved 123 images to disk
Missing images: 0

Wow, that executed instantaneously. Makes me a little suspicious.

Fair enough — let's sanity-check. Here's a quick verification:

import os
sizes = [os.path.getsize(f) for f in md_imgs]
print(f"Total images: {len(sizes)}")
print(f"Total size: {sum(sizes) / 1024 / 1024:.1f} MB")
print(f"Smallest: {min(sizes):,} bytes")
print(f"Largest: {max(sizes):,} bytes")
print(f"First 5 files and sizes:")
for f in md_imgs[:5]:
    print(f"  {f}: {os.path.getsize(f):,} bytes")

This will tell us if we got real image data or empty/tiny files. If the smallest file is just a few bytes, something went wrong.

import os
sizes = [os.path.getsize(f) for f in md_imgs]
print(f"Total images: {len(sizes)}")
print(f"Total size: {sum(sizes) / 1024 / 1024:.1f} MB")
print(f"Smallest: {min(sizes):,} bytes")
print(f"Largest: {max(sizes):,} bytes")
print(f"First 5 files and sizes:")
for f in md_imgs[:5]:
    print(f"  {f}: {os.path.getsize(f):,} bytes")
Total images: 123
Total size: 3.5 MB
Smallest: 5,103 bytes
Largest: 109,487 bytes
First 5 files and sizes:
  935eed7aa61f7777f62cfc032e11bee9_img.jpg: 12,253 bytes
  02f9c911b69504d90bd20e0bc61c4bbb_img.jpg: 10,136 bytes
  9612f08d343bcc2a11b174db64ba7b3a_img.jpg: 11,015 bytes
  419237decd8294588ca7ca2aa84b0e75_img.jpg: 13,273 bytes
  f3ce2d7158eb708e3487b8e35415db35_img.jpg: 11,240 bytes

Okay, I guess we're good to go. Ready for phase two?

Phase 2! Here's the script to split the markdown into sections and add each as a note message. It includes resumability — if it gets interrupted, just run it again and it'll skip sections whose headings are already in the dialog.

import re
from pathlib import Path

md = Path('pepe-diss.md').read_text()
lines = md.split('\n')

# Split into sections at heading boundaries
sections = []
current = []
for line in lines:
    if re.match(r'^#{1,6}\s', line) and current:
        sections.append('\n'.join(current).strip())
        current = [line]
    else:
        current.append(line)
if current:
    sections.append('\n'.join(current).strip())

sections = [s for s in sections if s.strip()]
print(f"Found {len(sections)} sections to create")

# Get first line of each section to use as a key for resumability
def first_line(s): return s.split('\n')[0].strip()

# Check which sections already exist in the dialog
from dialoghelper.core import Notebook
nb = Notebook.current()
existing = set()
for cell in nb.nb.cells:
    if cell.cell_type == 'markdown':
        fl = cell.source.split('\n')[0].strip()
        existing.add(fl)

skipped = 0
added = 0
last_id = None
for i, section in enumerate(sections):
    fl = first_line(section)
    if fl in existing:
        skipped += 1
        continue
    kw = dict(content=section, msg_type='note', placement='at_end')
    if last_id:
        kw['id'] = last_id
    last_id = add_msg(**kw)
    added += 1
    if added % 10 == 0:
        print(f"  Added {added} sections so far (skipped {skipped})...")

print(f"\nDone! Added {added} new sections, skipped {skipped} already-existing.")

A few notes:

  • It splits at every #-level heading, giving ~150 separate notes
  • The resumability checks the first line of each section against existing note cells
  • Progress prints every 10 sections so you can see it working
  • If Notebook.current() or add_msg aren't available in the kernel, we'll see an error quickly and adjust

Give it a go!

import re
from pathlib import Path

md = Path('pepe-diss.md').read_text()
lines = md.split('\n')

# Split into sections at heading boundaries
sections = []
current = []
for line in lines:
    if re.match(r'^#{1,6}\s', line) and current:
        sections.append('\n'.join(current).strip())
        current = [line]
    else:
        current.append(line)
if current:
    sections.append('\n'.join(current).strip())

sections = [s for s in sections if s.strip()]
print(f"Found {len(sections)} sections to create")

# Get first line of each section to use as a key for resumability
def first_line(s): return s.split('\n')[0].strip()

# Check which sections already exist in the dialog
from dialoghelper.core import Notebook
nb = Notebook.current()
existing = set()
for cell in nb.nb.cells:
    if cell.cell_type == 'markdown':
        fl = cell.source.split('\n')[0].strip()
        existing.add(fl)

skipped = 0
added = 0
last_id = None
for i, section in enumerate(sections):
    fl = first_line(section)
    if fl in existing:
        skipped += 1
        continue
    kw = dict(content=section, msg_type='note', placement='at_end')
    if last_id:
        kw['id'] = last_id
    last_id = add_msg(**kw)
    added += 1
    if added % 10 == 0:
        print(f"  Added {added} sections so far (skipped {skipped})...")

print(f"\nDone! Added {added} new sections, skipped {skipped} already-existing.")
Found 154 sections to create
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[21], line 26
     22 # Get first line of each section to use as a key for resumability
     23 def first_line(s): return s.split('\n')[0].strip()
     24 
     25 # Check which sections already exist in the dialog
---> 26 from dialoghelper.core import Notebook
     27 nb = Notebook.current()
     28 existing = set()
     29 for cell in nb.nb.cells:

ImportError: cannot import name 'Notebook' from 'dialoghelper.core' (/usr/local/lib/python3.12/site-packages/dialoghelper/core.py)

Uh-oh. What do you think went wrong? I think using dialogue helper is the right way to go, but perhaps your uppercase n isn't right or something else. Do you wanna maybe review the list of available dialog helper functions?

Good call — let me check what's actually available.

symdir(sym='dialoghelper.core', exclude_private=True)→'----------------------------------------…'
{
  "id": "toolu_01Kdxuc7RdxDuSyJrBz5qUyk",
  "call": {
    "function": "symdir",
    "arguments": {
      "sym": "dialoghelper.core",
      "exclude_private": "True"
    }
  },
  "result": "---------------------------------------------------------------------------\nKeyError                                  Traceback (most recent call last)\nFile /usr/local/lib/python3.12/site-packages/toolslm/inspecttools.py:64, in resolve(sym)\n     63 else: parts = re.split(r'\\.(?![^\\[]*\\])', sym)\n---> 64 try: obj = _last if parts[0] == '_last' else g[parts[0]]\n     65 except KeyError: raise SymbolNotFound(f\"Symbol '{parts[0]}' not found. Consider using `importmodule` first.\")\n\nKeyError: 'dialoghelper'\n\nDuring handling of the above exception, another exception occurred:\n\nSymbolNotFound                            Traceback (most recent call last)\nCell In[22], line 1\n----> 1 await maybe_await(symdir(sym='dialoghelper.core', exclude_private=True))\n\nFile /usr/local/lib/python3.12/site-packages/toolslm/inspecttools.py:171, in symdir(sym, exclude_private)\n    165 @llmtool\n    166 def symdir(\n    167     sym: str,  # Dotted symbol path (e.g `Interval` or `sympy.sets.sets.Interval`) or \"_last\" for previous result\n    168     exclude_private: bool=False # Filter out attrs starting with \"_\"\n    169 ):\n    170     \"\"\"Get dir() listing of a symbol's attributes and set `_last`. E.g: `symdir(\"sympy.Interval\")` -> `['__add__', '__and__', ...]`\"\"\"\n--> 171     res = dir(resolve(sym))\n    172     if not exclude_private: return res\n    173     return [o for o in res if o[0]!='_']\n\nFile /usr/local/lib/python3.12/site-packages/toolslm/inspecttools.py:65, in resolve(sym)\n     63 else: parts = re.split(r'\\.(?![^\\[]*\\])', sym)\n     64 try: obj = _last if parts[0] == '_last' else g[parts[0]]\n---> 65 except KeyError: raise SymbolNotFound(f\"Symbol '{parts[0]}' not found. Consider using `importmodule` first.\")\n     66 for part in parts[1:]:\n     67     match = re.match(r'(\\w+)\\[(\\d+)\\]$', part)\n\nSymbolNotFound: SymbolNotFound(Symbol 'dialoghelper' not found. Consider using `importmodule` first.)"
}

Need to import it first.

importmodule(mod='dialoghelper.core')→'No output'
{
  "id": "toolu_01Q8sHw3m5txqbp4suV1dQhd",
  "call": {
    "function": "importmodule",
    "arguments": {
      "mod": "dialoghelper.core"
    }
  },
  "result": "No output"
}

Imported. Now let me list the exports.

symdir(sym='dialoghelper.core', exclude_private=True)→"['A', 'APIRouter', 'ASGIApp', 'AX', 'Abb…"
{
  "id": "toolu_0166ApBUyY6CpyHLTuK8FKpf",
  "call": {
    "function": "symdir",
    "arguments": {
      "sym": "dialoghelper.core",
      "exclude_private": "True"
    }
  },
  "result": "<TRUNCATED>\u2026Router', 'ASGIApp', 'AX', 'Abbr', 'Address', 'Any', 'AnyNow', 'ApiReturn', 'Area', 'Article', 'Aside', 'AsyncClient', 'AttrDict', 'AttrDictDefault', 'Audio', 'AuthCredentials', 'AuthenticationBackend', 'AuthenticationError', 'AuthenticationMiddleware', 'B', 'BackgroundTask', 'BackgroundTasks', 'Base', 'BasicAuthMiddleware', 'BasicRepr', 'Bdi', 'Bdo', 'Beforeware', 'Blockquote', 'Body', 'Br', 'BuiltinFunctionType', 'BuiltinMethodType', 'Button', 'CONVERTOR_TYPES', 'CORSMiddleware', 'CachedAwaitable', 'CachedIter', 'Canvas', 'Caption', 'Card', 'CheckRunEvent', 'CheckSuiteEvent', 'CheckboxX', 'Cite', 'Client', 'Code', 'CodeScanningAlertEvent', 'Col', 'Colgroup', 'CollBase', 'CommaSeparatedStrings', 'CommitCommentEvent', 'Config', 'Connection', 'Container', 'ContentReferenceEvent', 'ContextEvent', 'ContextManagers', 'Convertor', 'CreateEvent', 'DEFAULT', 'Data', 'Database', 'Datalist', 'Dd', 'Default', 'Del', 'DeleteEvent', 'DepProp', 'DeployKeyEvent', 'DeploymentEvent', 'DeploymentStatusEvent', 'Details', 'Dfn', 'Dialog', 'DialogX', 'Dict', 'Div', 'Dl', 'Dt', 'EMPTY_TREE_SHA', 'Em', 'Embed', 'Event', 'EventStream', 'EventTimer', 'FT', 'FastHTML', 'FastHTMLWithLiveReload', 'Favicon', 'Fencedframe', 'Fieldset', 'Figcaption', 'Figure', 'File', 'FileResponse', 'Float', 'Footer', 'ForkEvent', 'Form', 'FormData', 'Fragment', 'FtResponse', 'FullResponse', 'FunctionType', 'GH_HOST', 'Generator', 'GetAttr', 'GetAttrBase', 'GhApi', 'GhDeviceAuth', 'GhEvent', 'GithubAppAuthorizationEvent', 'GollumEvent', 'Grid', 'Group', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'HTML', 'HTMLResponse', 'HTTPConnection', 'HTTPEndpoint', 'HTTPException', 'HTTPSRedirectMiddleware', 'Head', 'Header', 'Headers', 'Hgroup', 'Hidden', 'HighlightJS', 'Host', 'HostRoute', 'Hr', 'Html', 'HtmxHeaders', 'HtmxOn', 'HtmxResponseHeaders', 'HttpHeader', 'I', 'IN_COLAB', 'IN_IPYTHON', 'IN_JUPYTER', 'IN_NOTEBOOK', 'Iframe', 'Img', 'ImportEnum', 'Inf', 'Input', 'InputBtn', 'InputForm', 'Ins', 'InstallationEvent', 'InstallationRepositoriesEvent', 'Int', 'IssueCommentEvent', 'IssuesEvent', 'IterLen', 'Iterable', 'Iterator', 'JSONResponse', 'JSONResponseOrig', 'KatexMarkdownJS', 'Kbd', 'L', 'Label', 'LabelEvent', 'LdContactPoint', 'LdCourse', 'LdCourseInstance', 'LdJson', 'LdOrg', 'LdWebsite', 'Legend', 'Li', 'Lifespan', 'Link', 'List', 'MISSING', 'Main', 'Map', 'Mapping', 'Mark', 'Markdown', 'MarkdownJS', 'MarkdownRenderer', 'MarketplacePurchaseEvent', 'Match', 'Math', 'MemberEvent', 'MembershipEvent', 'Menu', 'MermaidJS', 'Meta', 'MetaEvent', 'Meter', 'MethodDescriptorType', 'MethodType', 'MethodWrapperType', 'Middleware', 'MiddlewareBase', 'MilestoneEvent', 'MissingPrimaryKey', 'Mount', 'NS', 'Nav', 'Nbsp', 'NeedsEvent', 'NoneType', 'Noscript', 'NotFoundError', 'NotStr', 'Now', 'NullType', 'Object', 'Ol', 'On', 'Optgroup', 'Option', 'Optional', 'OrgBlockEvent', 'OrganizationEvent', 'Output', 'P', 'PackageEvent', 'PageBuildEvent', 'Param', 'Parameter', 'PartialFormatter', 'Path', 'PicoBusy', 'Picture', 'PingEvent', 'Placements', 'PortalExperimental', 'Pre', 'PrettyString', 'Prev', 'ProcessPoolExecutor', 'Progress', 'ProjectCardEvent', 'ProjectColumnEvent', 'ProjectEvent', 'PublicEvent', 'PullRequestEvent', 'PullRequestReviewCommentEvent', 'PullRequestReviewEvent', 'PushEvent', 'Q', 'Queryable', 'Receive', 'Redirect', 'RedirectResponse', 'ReindexCollection', 'ReleaseEvent', 'RepositoryDispatchEvent', 'RepositoryEvent', 'RepositoryImportEvent', 'RepositoryVulnerabilityAlertEvent', 'Request', 'Response', 'Route', 'RouteFuncs', 'Router', 'Rp', 'Rt', 'Ruby', 'RunPython', 'S', 'SQLError', 'Safe', 'Samp', 'SaveReturn', 'ScheduleEvent', 'Scope', 'Script', 'ScriptX', 'Search', 'Secret', 'Section', 'SecurityAdvisoryEvent', 'Select', 'Self', 'Send', 'Sequence', 'SessionMiddleware', 'Set', 'ShowPrint', 'SimpleNamespace', 'SimpleUser', 'Slot', 'Small', 'Socials', 'Solveit_input', 'SortableJS', 'Source', 'Span', 'SponsorshipEvent', 'StarEvent', 'Starlette', 'State', 'Stateful', 'StaticFiles', 'StaticNoCache', 'StatusEvent', 'StopResponse', 'Str', 'StrEnum', 'StreamingResponse', 'Strike', 'StringConvertor', 'Strong', 'Style', 'StyleX', 'Sub', 'Summary', 'Sup', 'Surreal', 'Svg', 'Table', 'Tbody', 'Td', 'TeamAddEvent', 'TeamEvent', 'Template', 'TemporaryDirectory', 'Textarea', 'Tfoot', 'Th', 'Thead', 'ThreadPoolExecutor', 'Time', 'Title', 'Titled', 'Toast', 'ToolResponse', 'Tr', 'Track', 'TrustedHostMiddleware', 'Tuple', 'U', 'UNSET', 'URLPath', 'Ul', 'Union', 'Unset', 'UploadFile', 'ValEnum', 'Var', 'Video', 'WatchEvent', 'Wbr', 'WebSocket', 'WebSocketDisconnect', 'WebSocketEndpoint', 'WebSocketException', 'WebSocketRoute', 'WorkflowDispatchEvent', 'WorkflowRunEvent', 'WrapperDescriptorType', 'YouTubeEmbed', 'actions_debug', 'actions_error', 'actions_group', 'actions_mask', 'actions_output', 'actions_warn', 'add', 'add_docs', 'add_html', 'add_html_a', 'add_mod', 'add_mod_a', 'add_msg', 'add_props', 'add_scr', 'add_scr_a', 'add_sig_param', 'add_styles', 'add_to\u2026</TRUNCATED>"
}

OK, so there's no Notebook class — but find_msgs, curr_dialog, and add_msg are all available directly. The fix is simple: replace the Notebook inspection with find_msgs. Here's the corrected resumability section — just replace lines 25–31 in your code cell:

# Check which sections already exist in the dialog
existing_msgs = find_msgs(msg_type='note', include_meta=False, include_output=False)
existing = set()
for m in existing_msgs:
    fl = m['content'].split('\n')[0].strip()
    existing.add(fl)

Everything else stays the same.

import re
from pathlib import Path

md = Path('pepe-diss.md').read_text()
lines = md.split('\n')

# Split into sections at heading boundaries
sections = []
current = []
for line in lines:
    if re.match(r'^#{1,6}\s', line) and current:
        sections.append('\n'.join(current).strip())
        current = [line]
    else:
        current.append(line)
if current:
    sections.append('\n'.join(current).strip())

sections = [s for s in sections if s.strip()]
print(f"Found {len(sections)} sections to create")

# Get first line of each section to use as a key for resumability
def first_line(s): return s.split('\n')[0].strip()

# Check which sections already exist in the dialog
existing_msgs = find_msgs(msg_type='note', include_meta=False, include_output=False)
existing = set()
for m in existing_msgs:
    fl = m['content'].split('\n')[0].strip()
    existing.add(fl)

skipped = 0
added = 0
last_id = None
for i, section in enumerate(sections):
    fl = first_line(section)
    if fl in existing:
        skipped += 1
        continue
    kw = dict(content=section, msg_type='note', placement='at_end')
    if last_id:
        kw['id'] = last_id
    last_id = add_msg(**kw)
    added += 1
    if added % 10 == 0:
        print(f"  Added {added} sections so far (skipped {skipped})...")

print(f"\nDone! Added {added} new sections, skipped {skipped} already-existing.")
Found 154 sections to create
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[25], line 28
     24 
     25 # Check which sections already exist in the dialog
     26 existing_msgs = find_msgs(msg_type='note', include_meta=False, include_output=False)
     27 existing = set()
---> 28 for m in existing_msgs:
     29     fl = m['content'].split('\n')[0].strip()
     30     existing.add(fl)
     31 

TypeError: 'coroutine' object is not iterable

looks like that's not the right way to do that. Can you maybe double check that?

The issue is that find_msgs is async — it returns a coroutine. In a solveit code cell you need to await it. Change line 26 to:

existing_msgs = await find_msgs(msg_type='note', include_meta=False, include_output=False)

And similarly, add_msg is likely async too, so change the call on line 43 to:

    last_id = await add_msg(**kw)
import re
from pathlib import Path

md = Path('pepe-diss.md').read_text()
lines = md.split('\n')

# Split into sections at heading boundaries
sections = []
current = []
for line in lines:
    if re.match(r'^#{1,6}\s', line) and current:
        sections.append('\n'.join(current).strip())
        current = [line]
    else:
        current.append(line)
if current:
    sections.append('\n'.join(current).strip())

sections = [s for s in sections if s.strip()]
print(f"Found {len(sections)} sections to create")

# Get first line of each section to use as a key for resumability
def first_line(s): return s.split('\n')[0].strip()

# Check which sections already exist in the dialog
#existing_msgs = find_msgs(msg_type='note', include_meta=False, include_output=False)
existing_msgs = await find_msgs(msg_type='note', include_meta=False, include_output=False)

existing = set()
for m in existing_msgs:
    fl = m['content'].split('\n')[0].strip()
    existing.add(fl)

skipped = 0
added = 0
last_id = None
for i, section in enumerate(sections):
    fl = first_line(section)
    if fl in existing:
        skipped += 1
        continue
    kw = dict(content=section, msg_type='note', placement='at_end')
    if last_id:
        kw['id'] = last_id
    last_id = await add_msg(**kw)
    added += 1
    if added % 10 == 0:
        print(f"  Added {added} sections so far (skipped {skipped})...")

print(f"\nDone! Added {added} new sections, skipped {skipped} already-existing.")
Found 154 sections to create
  Added 10 sections so far (skipped 0)...
  Added 20 sections so far (skipped 0)...
  Added 30 sections so far (skipped 0)...
  Added 40 sections so far (skipped 0)...
  Added 50 sections so far (skipped 0)...
  Added 60 sections so far (skipped 0)...
  Added 70 sections so far (skipped 0)...
  Added 80 sections so far (skipped 0)...
  Added 90 sections so far (skipped 0)...
  Added 100 sections so far (skipped 0)...
  Added 110 sections so far (skipped 0)...
  Added 120 sections so far (skipped 0)...
  Added 130 sections so far (skipped 0)...
  Added 140 sections so far (skipped 0)...
  Added 150 sections so far (skipped 0)...

Done! Added 154 new sections, skipped 0 already-existing.

Okay, that looks like it actually worked. it looks pretty good. I see the math, I see the figures, lots of little subdivided individual notes. Are we done or were there additional phases still to be done?

We're done! The original plan was:

  1. Phase 1: Extract and save images ✅ — 123 images saved from HTML
  2. Phase 2: Split markdown into sections and create notes ✅ — 154 sections added

The math, figures, and section structure are all in place. You've got a fully browsable deep-reading dialog of the dissertation. Scroll through and see how it looks — if anything needs tweaking (e.g. a section that should be merged or split further, or an image that didn't render), just let me know.

Machine Learning with Geometric Algebra: Multivectors for Modelling, Understanding and Computing

935eed7aa61f7777f62cfc032e11bee9_img.jpg

The crest of the University of Cambridge, featuring a shield divided into four quadrants by a cross. The quadrants contain four lions passant guardant in pale, and the center of the cross contains a closed book.

University of Cambridge crest

Alberto Pepe

Supervisor: Prof. Joan Lasenby

Department of Engineering
University of Cambridge

This dissertation is submitted for the degree of
Doctor of Philosophy

Darwin College

August 2025

To my nephew Gian Maria, for the light and colour he brought into our lives

Declaration

This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the preface and specified in the text. It is not substantially the same as any work that has already been submitted, or, is being concurrently submitted, for any degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the preface and specified in the text. It does not exceed the prescribed word limit for the relevant Degree Committee.

Alberto Pepe
August 2025

Acknowledgements

When I began my PhD, I saw it as a marathon: every day, I aimed to move forward, however small the step. Some days flew by; others dragged. Now, approaching the finish line, I realise I could never have done this alone. I may not get another chance like this thesis to properly thank everyone, so I'll do my best not to leave anyone out.

First and foremost, I thank my supervisor, Joan Lasenby. It's often said a PhD depends greatly on who guides you, and I couldn't agree more. Thank you for your contagious optimism, the many hours spent discussing our research, the freedom to explore whatever sparked my interest, your support for my work in industry, and your patient advice as you read through countless and often messy drafts. This opportunity was truly once in a lifetime, and all my growth as a researcher I owe it to your guidance.

To my mum, Silvana: thank you for always supporting my choices and encouraging me to explore the world. Your strength and courage will stay with me forever.

To my sister Alessia and my brother Carmine: I've always admired your brilliance and drive. No matter how far apart we are, I carry a piece of you with me.

To my entire family, my roots and constant center of gravity: my dad Salvatore, Nonna Anna, aunts Gerardina, Alda, and Maria Pia; Andrea and Lorenza; Ilaria and Gian Maria; Francesca, Filippo, Francesco, Federico, Angelo; zio Giuseppe and zia Mariaconsiglia; zio Vito, zio Raffaele, and zio Mino.

Thank you to B. R. for your guidance, the long research discussions, and all the in-between moments I deeply treasure.

To Antonio V. and Alda, my chosen siblings since birth. Few are lucky to have such enduring friendship, and I'm proud to be among them.

To my high school friends: Adua, Camilla, Manuela, Margherita, Mariachiara, Michela, Luca, and Valentina; thank you for the memories, the support, and your constant belief in me. You are my safe haven.

To Antonio A. and Eleonora, thank you for the laughter and adventures since Shanghai, and for all those still to come.

To Piera and Gianluca, from my Turin days, thank you for the lasting bond, built on mutual admiration and affection.

To Deesha, Cindy, Fariza, and Maria, some of the best friends I've made here, thank you for the unforgettable moments and for letting me be fully myself.

Thank you to Cambridge, and to the many brilliant, inspiring students who've shaped my perspective on the world. It's been an honour to be part of such a unique place.

To Darwin College, my second home, and especially Sabrina: thank you for always brightening the day.

To Qian, Haris, Joana, Yuxin, Zhening, Gabriel, Pooya, Ralph, and the entire SigProc Lab. Thank you for sharing your ideas and for making the Engineering Department feel less gray.

To Jasivan: our summer in the Evergreen State made my Microsoft experience unforgettable. Thank you for the baseball, Vegas, and everything in between.

To every book and every piece of music I turned to over these years that has kept me grounded and focused.

And finally, to Namaste Cambridge and their legendary £9.49 chicken biryani: you got me through more than a few all-nighters, and for that, you have my gratitude.

Abstract

Geometric Algebra (GA) has been successfully applied in several fields, including physics, graphics, and robotics, but its potential in Machine Learning (ML) and Deep Learning (DL) remains largely unexplored. This thesis addresses that gap by investigating the application of GA to a variety of ML tasks that are inherently geometric in nature. Our premise is simple yet powerful: if, as it is often said, ML is a clever rebranding of linear algebra, then geometric problems in ML deserve to be tackled with GA, an extension of linear algebra designed to represent geometric objects and perform transformations on them naturally and compactly.

The GA framework provides a unified language to represent, understand, and manipulate geometric entities using multivectors, rotors, and sandwich products. The result is a set of tools that are not only mathematically elegant but also highly effective in practice, spanning a broad range of domains: Chapter 2 tackles regression tasks on rotation groups and molecular geometry optimization; Chapter 3 explores protein modelling and structure prediction; Chapter 4 addresses 3D camera pose estimation and 3D line alignment; Chapter 5 focuses on the solution of partial differential equations for computational fluid dynamics and electromagnetism.

We demonstrate that GA can be employed as a versatile, practical and principled framework for building geometry-aware ML systems independently of the employed architecture. By embedding geometric priors directly into the model architecture via GA, we unlock several advantages, including lower regression errors, robustness to noise and transformations and interpretability of intermediate computations. More broadly, this thesis is an invitation to rethink how neural networks should represent, model, and transform geometric data, grounding these operations in an algebra that reflects the structure of the problem itself. As ML continues to engage with increasingly complex and structured data, the need for such expressive representations will only grow, and GA offers a compelling framework to meet that need.

Table of contents

List of figures xv
List of tables xxv
Nomenclature xxvii
1 Introduction and Background 1
1.1 Introduction . . . . . 1
1.2 Fundamentals of GA . . . . . 4
1.2.1 Defining a subalgebra . . . . . 4
1.2.2 The geometric product . . . . . 4
1.2.3 Geometric Algebra of the plane and of the space . . . . . 5
1.2.4 Geometric transformations . . . . . 6
1.2.5 Conformal Geometric Algebra (CGA) . . . . . 10
1.2.6 Transformations in CGA . . . . . 11
1.2.7 Objects and incidence . . . . . 14
1.2.8 Other algebras . . . . . 18
1.3 Learning with GA . . . . . 20
1.3.1 GA: the 6 th “G” of Geometric Deep Learning? . . . . . 22
1.3.2 Learning with GA in practice . . . . . 23
2 Rotations as Rotors 25
2.1 Introduction . . . . . 25
2.2 Learning rotations . . . . . 26
2.2.1 Background . . . . . 26
2.2.2 The 6D representation . . . . . 26
2.2.3 Learning mappings between representation spaces . . . . . 27
2.2.4 Pose estimation of 3D point clouds . . . . . 30
2.2.5 Inverse kinematics . . . . . 34

2.3 Molecular geometry optimisation through rotors . . . . . 37
2.3.1 Background . . . . . 37
2.3.2 Molecules as rotors . . . . . 38
2.3.3 The evolutionary algorithm . . . . . 39
2.3.4 Examples . . . . . 42
2.4 Conclusions . . . . . 45
3 Protein Structure Prediction (PSP) . . . . . 47
3.1 Introduction . . . . . 47
3.2 Protein structures as 3D CGA planes . . . . . 48
3.2.1 Geometrical interpretation of the cost function . . . . . 49
3.3 Cost maps: a geometric description of amino acids . . . . . 50
3.3.1 Cost maps and secondary structures . . . . . 51
3.3.2 Cost maps and different protein conformations . . . . . 53
3.3.3 Cost maps and NMR ensembles . . . . . 55
3.4 Alternative representations . . . . . 56
3.4.1 Dihedral angles . . . . . 57
3.4.2 Dot product between oriented points . . . . . 58
3.5 Dataset: PDNET . . . . . 60
3.6 Estimating orientational features . . . . . 61
3.6.1 Training details . . . . . 62
3.7 PSP with GA features . . . . . 63
3.7.1 Architecture: Graph Transformer + 3D projector . . . . . 63
3.7.2 Training details . . . . . 65
3.7.3 Alignment of predicted coordinates to ground truth . . . . . 66
3.7.4 Results . . . . . 66
3.8 PSP with equivariant layers in 3D GA . . . . . 73
3.8.1 GA-equivariant 3D projector . . . . . 73
3.8.2 Triple-wise fully connected geometric product layer . . . . . 76
3.8.3 Results . . . . . 76
3.9 Conclusions . . . . . 79
4 3D Pose Estimation . . . . . 83
4.1 Introduction . . . . . 83
4.2 Camera Pose Estimation via CGAPoseNet+GCAN . . . . . 84
4.2.1 Background . . . . . 84
4.2.2 Problem definition . . . . . 85
  • 4.2.3 Method . . . . . 87
    • 4.2.4 Poses as motors in 1d-Up CGA . . . . . 87
    • 4.2.5 Architecture: InceptionV3 + sandwich product layers . . . . . 90
    • 4.2.6 Experiments . . . . . 91
    • 4.2.7 Datasets: Cambridge Landmarks and 7 Scenes . . . . . 91
    • 4.2.8 Error metrics . . . . . 92
    • 4.2.9 Implementation details . . . . . 92
    • 4.2.10 Results . . . . . 94
    • 4.2.11 Geometric interpretation of intermediate outputs . . . . . 98
  • 4.3 3D Line Alignment with Define, Refine, Align . . . . . 100
    • 4.3.1 Related work on partial-to-partial registration . . . . . 102
    • 4.3.2 Method . . . . . 103
    • 4.3.3 3D Lines as bivectors in 1d-Up CGA . . . . . 103
    • 4.3.4 Architecture: attentional, rotational and equivariant modules . . . . . 106
    • 4.3.5 Datasets: Structured3D and Semantic3D . . . . . 109
    • 4.3.6 Error metrics . . . . . 110
    • 4.3.7 Implementation details . . . . . 114
    • 4.3.8 Results . . . . . 114
  • 4.4 Conclusion . . . . . 116

5 Partial Differential Equations (PDEs) 119

  • 5.1 Introduction . . . . . 119
  • 5.2 Background . . . . . 120

Part I: Computational Fluid Dynamics 122

  • 5.3 GA-ReLU: an activation function for Navier-Stokes PDEs . . . . . 123
    • 5.3.1 Problem definition . . . . . 123
    • 5.3.2 Data generation . . . . . 127
    • 5.3.3 Training details . . . . . 127
    • 5.3.4 Experiments . . . . . 127
    • 5.3.5 Results . . . . . 128
  • 5.4 Fengbo: a neural operator for 3D PDEs over irregular geometries . . . . . 129
    • 5.4.1 Method . . . . . 130
    • 5.4.2 Architecture . . . . . 130
    • 5.4.3 Experiments . . . . . 135
    • 5.4.4 Implementation details . . . . . 139
    • 5.4.5 Training details . . . . . 143

5.4.6 Ablation study . . . . . 144

5.4.7 Notes on computational complexity . . . . . 149

Part II: Electromagnetism 153

5.5 STAResNet: working in STA to solve Maxwell’s PDEs . . . . . 154

5.5.1 Problem definition . . . . . 154

5.5.2 Architecture: a ResNet in STA . . . . . 158

5.5.3 Experiments in 2D . . . . . 161

5.5.4 Experiments in 3D . . . . . 169

5.6 Conclusions . . . . . 176

6 Conclusions 179

6.1 Conclusions . . . . . 179

6.2 Contributions . . . . . 180

6.3 Future Work... . . . . . 181

6.4 ...and Beyond . . . . . 181

References 183

List of figures

1.1 Elements of \(\mathcal{G}_{3,0,0}\) , the GA of Euclidean space: one scalar, three vectors \(e_1, e_2, e_3\) , three bivectors \(e_{12}, e_{23}, e_{13}\) and one trivector \(e_{123}\) . . . . . 6
1.2 Reflection in Geometric Algebra: a multivector \(\Pi\) (e.g., a plane represented as a bivector) is reflected across a plane with unit normal vector \(n\) using the formula \(\Pi' = -n\Pi n\) . This reflection preserves the grade of \(\Pi\) . . . . . 7
1.3 Rotation in Geometric Algebra: a multivector \(\Pi\) is rotated by applying two successive reflections across planes with unit normals \(n\) and \(m\) . The rotation is given by \(\Pi'' = mn\Pi nm = R\Pi\tilde{R}\) , and preserves the grade of \(\Pi\) . The geometric product \(R = mn\) is called a rotor . . . . . 8
1.4 Visualization of some \(90^\circ\) rotations in 3D Geometric Algebra using sandwich products. The orange objects—vector, circle, plane, and point—are rotated to the blue counterparts by the sandwich product \(X \mapsto RX\tilde{R}\) , where \(\tilde{R}\) is the reverse of \(R\) . Rotation via the sandwich product is independent of the type of geometric object \(X\) , applying uniformly to vectors, bivectors, and points alike. . . . . 9
1.5 Inversion in Conformal Geometric Algebra: Euclidean vector \(a\) (blue) is transformed via CGA using sandwich products. Its inversion (red) is given by \(eF(a)e\) . . . . . 12
1.6 Translations in Conformal Geometric Algebra: the green plane is the result of translating the original pink plane by the Euclidean vector \(a\) . The translation is performed via the sandwich product \(\Pi' = T_a\Pi\tilde{T}_a\) , with \(T_a = 1 - \frac{1}{2}a\bar{e}\) , \(\tilde{T}_a = 1 + \frac{1}{2}a\bar{e}\) . . . . . 13
1.7 Dilations in Conformal Geometric Algebra: The teal sphere \(\Sigma'\) is the result of dilating the original yellow sphere \(\Sigma\) by the scale factor \(\alpha = 6\) . The dilation is performed via the sandwich product \(\Sigma' = D_\alpha \Sigma \tilde{D}_\alpha\) , with \(D_\alpha = \exp\left(\frac{\alpha}{2}e\bar{e}\right)\) . . . . . 14

1.8 Lines and circles in Conformal Geometric Algebra are trivectors of the type \(X_1 \wedge X_2 \wedge X_3\) . A line can be interpreted as a circle with infinite radius, i.e. containing the point at infinity \(X_3 = n_\infty\) . . . . . 16
1.9 Planes and spheres in Conformal Geometric Algebra are quadrivectors of the type \(X_1 \wedge X_2 \wedge X_3 \wedge X_4\) . A plane can be interpreted as a sphere with infinite radius, i.e. containing the point at infinity \(X_4 = n_\infty\) . . . . 16
1.10 The meet operator between two grade-4 objects (two spheres, two planes, or a sphere and a plane), yields a grade-3 object (circles or lines) unless degenerate. . . . . 17
1.11 The meet operator between a grade-4 objects (sphere or plane) and a grade-3 object (line or circle), yields a grade-2 object (a point pair) unless degenerate. . . . . 18
1.12 The parameter \(\lambda\) controls the curvature of the space in 1d-Up CGA. As \(\lambda\) increases, lines in \(G(4, 0, 0)\) are transformed into circles and planes are transformed into spheres. . . . . 19
1.13 Multivectors for Modelling, Understanding, and Computing. This thesis aims to demonstrate the power of GA across the entire machine learning pipeline. Early works, such as those on rotation estimation and molecular geometry optimization (Chapter 2), use multivectors as a modelling tool. Later works like CGAPoseNet+GCAN (Chapter 4) and STAResNet (Chapter 5) exemplify true geometric deep learning, where multivectors serve not only as a modelling tool, but also as computational units within neural architectures. Both approaches serve the goal of understanding how geometric structure informs and constrains the learning process. . . . . 21
2.1 The network for the pose estimation from 3D point clouds problem. Input: concatenated reference and target point clouds \(P_r, P_t \in \mathbb{R}^{N \times 3}\) . Output: rotation from \(P_r\) to \(P_t\) with \(D\) parameters. . . . . 31
2.2 Distribution of the geodesic error over the testing set for the proposed representations (left) and related percentile plot (right) for the pose estimation problem. . . . . 32
2.3 Average geodesic error (left) and Euclidean distance (right) as a function of the standard deviation of the noise applied on \(P_r, P_t\) . . . . . 33
2.4 The network for the inverse kinematic problem. Input: frames with \(31 \times 3\) spatial coordinates. Output: 31 rotations with \(D\) parameters. . . . 34
2.5 Distribution of the geodesic error over the testing set for the proposed representations (left) and related percentile plot (right) for the inverse kinematic problem. . . . . 35
2.6 Distribution of the Euclidean distance between predicted pose and T-pose over the testing set for the proposed representations (left) and related percentile plot (right), inverse kinematic problem. . . . . 35
2.7 Validation loss with early stopping of each of the 8 representations in the pose estimation problem (left) and inverse kinematic problem (right). . . . . 37
2.8 The two types of rotors employed to model molecular structures. . . . . 39
2.9 Starting and ending points of the DE algorithm for the Gly-Gly molecule. . . . . 42
2.10 Bivector coefficients of R on the unit sphere \(\Sigma\) for the Gly-Gly dipeptide for each iteration. . . . . 43
2.11 Starting and ending points of the DE algorithm for the Gly-Phe molecule. . . . . 44
2.12 Bivector coefficients of R on the unit sphere \(\Sigma\) for the Gly-Phe dipeptide for each iteration. Alleles are parametrised as \(\{\theta, \phi\}\) . R = \(\{R_{\theta,0}, R_{\phi,0}\}\) has fixed rotation axes. Green: \(R_\theta\) , magenta: \(R_\phi\) . Larger markers indicate the minima. . . . . 44
3.1 HIV-1 protease (ID: 1dmp) modelled as a collection of \(N - C_\alpha - C\) planes (left) and close up on the first 20 residues (right). Note \(\Pi_j = R_{ij}\Pi_i\tilde{R}_{ij}\) . . . . . 49
3.2 (a) contact, (b) distance and (c) cost map for protein chain 2HC5A. . . . . 51
3.3 (a) Cost map and (b) secondary structures for protein chain 4JZK. Visualizing the first 200 residues. Red: \(\alpha\) -helices, green: \(\beta\) -sheets, blue: hydrogen bond turns, white: all the others. . . . . 52
3.4 Examples of (zoomed-in) cost maps for different proteins along with their secondary structures. Patterns associated to a same secondary structure are consistent for different protein chains. . . . . 52
3.5 3D structures for chains 1LAF (cyan) and 2LAO (magenta). . . . . 53
3.6 Cost maps of the first 300 residues of protein chains (a) 1LAF, (b) 2LAO and (c) their difference. . . . . 54
3.7 3D structures for chains 6VXX (blue) and 6VYB (orange). . . . . 54
3.8 Cost maps of the first 500 residues of protein chains (a) 6VXX, (b) 6VYB and (c) their difference. . . . . 55
3.9 Three different models within the NMR ensemble of protein chain 1BVM with the complete ensemble. . . . . 55
3.10 (a)-(c): Three cost maps associated with 3 resolved structures of the NMR ensemble of chain 1BVM. Note the differences in pattern among the three maps highlighted by the coloured boxes. (d): Average difference of the 20 cost maps. The higher activity in the averaged map can be associated to higher resolution uncertainty or variability in the structure. . . . . 56
3.11 Ball-and-stick model of Insulin (ID: 3i40, left) and close up on the backbone in position \(i\) with dihedral angles \(\omega_i, \phi_i, \psi_i\) highlighted (right). . . . . 57
3.12 Angle maps for protein 12asA. . . . . 58
3.13 Haemoglobin (ID: 1a3n) modelled as a collection of oriented points centered at \(C_\alpha\) (left) and close up on 5 residues with labelled oriented point (right). . . . . 59
3.14 Dot product maps for protein 3i41A. . . . . 60
3.15 Predicting orientational features. We employed the residual neural network of [2] to predict (a) cost maps \(\mathbf{M}_\alpha\) , (b) dot product maps \(\mathbf{N}_\alpha\) and (c) angle maps \(\Phi, \Psi, \Omega\) . While all (true) orientational features provide similar improvements when predicting coordinates, not all of them are as readily predicted . . . . . 61
3.16 Ground truth and predicted orientational maps for protein 1a3aA. . . . . 62
3.17 Ground truth and predicted orientational maps for protein 1a70A. . . . . 63
3.18 The employed architecture. Features are recast in graph form to predict 3D backbone coordinates via a GrT + 3D projector architecture . . . . . 63
3.19 Results for protein 2gomA. The original 3D protein model is shown in green with the original distance map \(\mathbf{D}\) below. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(\hat{\mathbf{D}}\) built from \(P\) and the mean absolute error (MAE) and structural similarity index measure (SSIM) measured with respect to \(\mathbf{D}\) . . . . . 70
3.20 Results for protein 1yqhA. The original 3D protein model is shown in red with the original distance map \(\mathbf{D}\) below. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(\hat{\mathbf{D}}\) built from \(P\) and the MAE and SSIM measured with respect to \(\mathbf{D}\) . . . . . 71
3.21 Results for protein 1z0jB. The original 3D protein model is shown in yellow with the original distance map D below. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(\tilde{\mathbf{D}}\) built from \(P\) and the MAE and SSIM measured with respect to D . . . . . 72
3.22 Orientational maps and secondary structures. Helices (red) and sheets (green) can be easily spotted from the patterns in the cost map M . . . . 72
3.23 CGENN layers employed within the 3D Projector in a Protein Structure Prediction pipeline. A protein with \(M\) amino acids can be represented as a graph with \(D\) nodes and \(K\) edges. The Graph Transformer extracts a new node representation \(M \times D\) , while the 3D Projector maps the representation onto 3D space to obtain \(M \times 3\) 3D coordinates. In CGENNs, neurons, weights and biases are multivectors, in our case in \(\mathcal{G}_{3,0,0}\) , whose vector part can be interpreted geometrically. . . . . 74
3.24 Given a random multivector \(\mathbf{x}\) , random orthogonal transformation \(\rho(w)\) and our T-FCGP layer \(\psi\) , \(\psi(\rho(w)(\mathbf{x}))\) is equivalent to \(\rho(w)(\psi(\mathbf{x}))\) , hence \(\psi\) is equivariant. . . . . 77
3.25 Train loss for the GrT + 3D projector architecture for different CGENN layers in the projector. . . . . 77
3.26 Validation loss for the GrT + 3D projector architecture for different CGENN layers in the projector. . . . . 78
3.27 Median metrics with error metrics over the test set measured between ground truth distance maps and distance maps built on predicted coordinates. . . . . 79
3.28 Input and outputs of the CGENN layers for the prediction of the 3D structure of protein 2ehwA for approach (e). The outputs (in red) are also 3D structures, with sensible distance maps close to ground truth (in grey). . . . . 80

4.1 The CGAPoseNet+GCAN architecture. The output of the InceptionV3 network is reshaped to obtain a set of motor coefficient proposals. Motors are objects in the 1D-Up Algebra \(\mathcal{G}_{4,0}\) with scalar, bivector and quadrivector parts, giving a total of 8 real coefficients. These coefficients are used to build motors \(\in \mathcal{G}_{4,0}\) as input to the Geometric Clifford Algebra Network (GCAN). A motor represents a rotation and a translation, and it is hence a suitable representation for camera poses. The GCAN works in \(\mathcal{G}_{4,0}\) space and has weights, biases and outputs that are also motors, and hence interpretable as poses. The GCAN narrows down the proposals to a single motor through a geometric understanding of the scene. . . . . 89

4.2 The original CGAPoseNet pipeline, which is geometry-agnostic and adds a significant number of parameters compared to baseline PoseNet. . . . 91

4.3 Train (solid line) and validation (dashed line) losses for selected datasets. CGAPoseNet+GCAN attains the lowest loss profile due to its Geometric Clifford Algebra layers. . . . . 93

4.4 Ground truth and predicted translation component of the pose over the test set for selected datasets. . . . . 95

4.5 Ground truth and predicted rotation component of the pose over the test set for selected datasets. . . . . 96

4.6 Translation error over the test set for selected datasets. . . . . 97

4.7 Rotation error over the test set for selected datasets. . . . . 97

4.8 Two views of input and output poses of the GCAN layers for a test image in the Old Hospital dataset (relu activation). . . . . 98

4.9 Two views of input and output poses of the GCAN layers for a test image in the Old Hospital dataset (tanh activation). . . . . 99

4.10 Average input and output poses of the GCAN layers for a test image in the Old Hospital dataset (tanh activation) with 3 (left) and 4 (right) sandwich product layers in the GCAN. . . . . 100

4.11 Effects of \(\lambda\) in \(G(4, 0, 0)\) on pose estimation. . . . . 104

4.12 Define, Refine, Align (DRA): The DRA pipeline to estimate the pose \(M\) that aligns the source line bundle \(\mathcal{L}_S\) to the target line bundle \(\mathcal{L}_T\) via \(\mathcal{L}_T = M\mathcal{L}_S\tilde{M}\) . The Define module is an attention-based feature extractor to identify putative line matches in the two frames \(\mathcal{L}_{d,1}, \mathcal{L}_{d,2}\) . The Refine module consists of an Equivariant Module \(\phi\) and a Rotational Module \(\rho\) , that map lines to poses in \(G(4, 0, 0)\) . Lastly, the Align module averages poses \(M_{\rho,1}, M_{\rho,2}, M_{\phi,1}, M_{\phi,2}\) and normalizes the output to obtain \(\tilde{M}\) , aligning the two bundles. . . . . 105

4.13 The Define block, adapted from the Feature Extractor of [116]. The \(\ell_0\) and \(\ell_0\) components of lines \(\ell\) in \(\mathcal{L}_S, \mathcal{L}_T\) are processed separately through cascaded MLPs. Self- and cross-attention module are used to extract features from a graph representation of the line bundles. Such features are reshaped and normalised into line bundles \(\mathcal{L}_{d,1}, \mathcal{L}_{d,2}\) , containing fewer, matching lines. . . . . 107

4.14 The Equivariant Module \(\phi\) in the Refine block. It consists of a cascade of MV linear and geometric product layers, yielding poses \(M_\phi\) invariant to transformations \(\Gamma(\cdot)\) . . . . . 107

4.15 The Equivariant Module \(\rho\) in the Refine block. Lines are transformed into proposals of poses, which are rotated, translated and downsampled into \(M_\rho\) . . . . . 108

4.16 Two alignment examples (top and bottom) for the test set of Structured3D. (a) The source and target line bundles \(\mathcal{L}_S, \mathcal{L}_T\) ; (b)-(c)-(d) \(\mathcal{L}_S\) rotated and translated with ground truth motor \(M\) (blue) and with estimated motor \(\tilde{M}\) in (b) high curvature space (small \(\lambda\) ); (c) low curvature space (large \(\lambda\) ); (d) combined curvatures (red). . . . . 111

4.17 Two alignment examples (top and bottom) for the test set of Semantic3D. (a) The source and target line bundles \(\mathcal{L}_S, \mathcal{L}_T\) ; (b)-(c)-(d) \(\mathcal{L}_S\) rotated and translated with ground truth motor \(M\) (blue) and with estimated motor \(\tilde{M}\) in (b) high curvature space (small \(\lambda\) ); (c) low curvature space (large \(\lambda\) ); (d) combined curvatures (red). . . . . 115

5.1 Vector part of the (a) coefficient-wise ReLU \(\phi(\mathbf{x})\) (b) phase-dependent ReLU \(f(\mathbf{x})\) and (c) GA-ReLU \(\psi(\mathbf{x})\) . The scalar part, not reported, is identical for all three functions and equal to \(\phi(s)\) . . . . . 124

5.2 MSE error with error bars versus number of training data for Clifford ResNet with ReLU and GA-ReLU activation functions. . . . . 125


5.3 MSE error with error bars versus number of training data for Clifford FNO with ReLU and GA-ReLU activation functions. . . . . 126
5.4 Difference between ground truth and predicted scalar fields \(s_{t_{i+2}} - \hat{s}_{t_{i+2}}\) for 5 different time instants. Top row: GA-ReLU, bottom row: ReLU. Higher intensity is worse. Square boxes highlight areas of noticeable difference. . . . . 128
5.5 Difference between ground truth and predicted vector fields (i.e. quiver plots) \(\mathbf{u}_{t_{i+2}} - \hat{\mathbf{u}}_{t_{i+2}}\) for 5 different time instants. Top row: GA-ReLU, bottom row: ReLU. Higher intensity is worse. Square boxes highlight areas of noticeable difference. . . . . 128
5.6 The Fengbo architecture. Irregular geometries are discretised into fixed-resolution volumes of multivectors, over which Fengbo operates. It consists of three steps: (i) The geometry blocks operate on the geometry of the PDEs domain, capture local features, ensure grade mixing and upsample the inputs; (ii) The Clifford FNO establishes a mapping between the PDEs' geometry and their solution; (iii) The physics blocks operate on physical quantities, i.e. target of the regression. The entire architecture sits in 3D Clifford Algebra, guaranteeing interpretability. . . 129
5.7 An example of pressure geometry multivector \(P\) . It has a scalar component (the binary mask \(m_p\) ), 3 vector components (the 3D coordinates \(\mathbf{p}\) ) and 3 bivector components (the dual of the vectors \(\mathbf{n}\) normal to points \(\mathbf{p}\) ). . . . . 132
5.8 An example of velocity geometry multivector \(V\) . It has a scalar component (the binary mask \(m_v\) ) and 3 vector components (the 3D coordinates \(\mathbf{v}\) ). . . . . 133
5.9 (a) Ground truth pressure field (b) Fengbo's estimated pressure field and (c) their relative error for a test shape in the ShapeNet Car dataset. . . . . 136
5.10 (a) Ground truth pressure field (b) Fengbo's estimated pressure field and (c) their relative error for a test shape in the Ahmed Body dataset. . . . . 137
5.11 (a) Ground truth velocity field (b) Fengbo's estimated velocity field and (c) their relative error for a test shape in the ShapeNet Car dataset. . . . . 138
5.12 Intermediate outputs are interpretable physical quantities: \(Q'\) , the output of the 3D Clifford FNO block, is processed by \(N_p = 2\) 3D Physics blocks in parallel to obtain \(\hat{\phi}(\mathbf{x})\) and \(\hat{\psi}(\mathbf{x})\) , the pressure and velocity fields, respectively. . . . . 139
5.13 The three components of the Fengbo architecture. . . . . 140

5.14 Intermediate layers outputs from \(P\) to \(Q_P\) within the 3D Clifford Geometry block for a test sample in the ShapeNet Car dataset. . . . . 141

5.15 Ablation on the number of hidden channels \(C\) . . . . . 145

5.16 Ablation on the number of FNO blocks \(F\) . . . . . 146

5.17 Ablation on the number of Fourier modes \(m\) . . . . . 147

5.18 Number of parameters (M) and model size (MB) as a function of \(C, R, m\) . 148

5.19 Computational complexity comparison of Fengbo and Transolver [188] for the 3D case. . . . . 150

5.20 Relative \(L_2\) norm promotion versus computational complexity for Fengbo (red arrow and ellipse) and Transolver’s (black arrow and ellipse) ablations. 151

5.21 3D GA approach: solving Maxwell’s PDEs through Clifford ResNet, a ResNet-inspired network working in \(\mathcal{G}_{3,0,0}\) . . . . . 156

5.22 STA approach: solving Maxwell’s PDEs through STAResNet, our ResNet-inspired network working in \(\mathcal{G}_{1,3,0}\) . . . . . 158

5.23 Training and validation losses versus number of epochs for 2D Maxwell’s PDEs for instances sampled at (a) 25s, (b) 50s, (c) 75s, (d) 100s. . . . . 160

5.24 (a) Mean squared error and (b) correlation between estimated and ground truth EM fields in the test set for varying \(\Delta t\) . We included real-valued ResNet for a more comprehensive comparison. . . . . 161

5.25 Ground truth (left column) shown against \(\mathbf{F}^2\) (right column) and \(F^2\) (centre column) over the test set for varying \(\Delta t\) . Values are shown on a \(32 \times 32\) spatial grid. . . . . 163

5.26 The 5 different obstacle configurations. The 3 unseen geometries are highlighted. . . . . 164

5.27 Training and validation losses versus number of epochs for the PDE in the presence of obstacles, for a total of 5 different obstacles configuration during the training phase. . . . . 165

5.28 (a) Mean squared error and (b) correlation between estimated and ground truth fields over the two test sets with seen and unseen obstacle configurations. . . . . 165

5.29 Ground truth (left column) shown against \(\mathbf{F}^2\) (right column) and \(F^2\) (centre column) over the test set with seen obstacles configurations (number 5 of Fig. 5.26) shown in top 2 rows and unseen obstacles configurations (number 7 of Fig. 5.26), shown in bottom 2 rows. Values are shown on a \(48 \times 48\) spatial grid for one chosen timepoint. . . . . 166

5.30 Test error over the estimated EM fields in the presence of seen and unseen obstacle geometries versus the number of trainable parameters. 167

5.31 (a) Mean squared error and (b) correlation between estimated and ground truth EM fields over test set versus rollout steps \(m\) for the 2D case. Different colors refer to datasets with different \(\Delta t\) in between frames. 168

5.32 2D GT \(F^2, \mathbf{F}^2\) , estimated \(F_{est} = \{\hat{F}^2, \hat{\mathbf{F}}^2\}\) and differences \(\Delta^2 = \{|F^2 - \hat{F}^2|, |\mathbf{F}^2 - \hat{\mathbf{F}}^2|\}\) for STAResNet and Clifford ResNet at different rollout steps: (a) \(m = 1\) , (b) \(m = 2\) , (c) \(m = 8\) and (d) \(m = 10\) . Here \(\Delta t = 5s\) . Red circles highlight regions of significant difference. Images are shown on a \(48 \times 48\) spatial grid. . . . . 170

5.33 2D GT \(F^2, \mathbf{F}^2\) , estimated \(F_{est} = \{\hat{F}^2, \hat{\mathbf{F}}^2\}\) and differences \(\Delta^2 = \{|F^2 - \hat{F}^2|, |\mathbf{F}^2 - \hat{\mathbf{F}}^2|\}\) for STAResNet and Clifford ResNet at different rollout steps: (a) \(m = 1\) , (b) \(m = 5\) , (c) \(m = 8\) and (d) \(m = 10\) . Here \(\Delta t = 8s\) . Red circles highlight regions of significant difference. Images are shown on a \(48 \times 48\) spatial grid. . . . . 171

5.34 Training and validation losses versus number of epochs for 2D Maxwell's PDEs for instances sampled at (a) 5s, (b) 8s, (c) 10s, (d) 15s. . . . . 172

5.35 Mean squared error between estimated and ground truth EM fields over test set versus rollout steps \(m\) for the 3D case. Different colors refer to datasets with different \(\Delta t\) in between frames. . . . . 173

5.36 Slices of the 3D GT \(F^2, \mathbf{F}^2\) , estimated \(\hat{F}^2, \hat{\mathbf{F}}^2\) and differences \(|F^2 - \hat{F}^2|, |\mathbf{F}^2 - \hat{\mathbf{F}}^2|\) for STAResNet and Clifford ResNet at different heights \(z\) of the volume at different rollout steps: (a) \(m = 3\) , (b) \(m = 5\) . \(\Delta t = 5s\) . Images are shown on a \(28 \times 28\) spatial grid. . . . . 174

5.37 Slices of the 3D GT \(F^2, \mathbf{F}^2\) , estimated \(\hat{F}^2, \hat{\mathbf{F}}^2\) and differences \(|F^2 - \hat{F}^2|, |\mathbf{F}^2 - \hat{\mathbf{F}}^2|\) for STAResNet and Clifford ResNet at different heights \(z\) of the volume at different rollout steps: (a) \(m = 8\) , (b) \(m = 10\) . \(\Delta t = 5s\) . Images are shown on a \(28 \times 28\) spatial grid. . . . . 175

List of tables

1.1 Summary of common geometric transformations in GA and CGA . . . . 13
1.2 Objects in CGA . . . . . 17
1.3 Blades in Spacetime Algebra (STA) \(\mathcal{G}_{1,3,0}\) . Elements colored in red square to +1, while elements colored in blue square to -1. . . . . 19
1.4 Embedding an architecture in a GA \(\mathcal{G}\) enables geometric deep learning on any of the 5 G's of [20]. Left column: architectures on which geometric deep learning is performed; right column: example of possible embedding of such architectures in GA. In this thesis, we provide examples of GA-based learning for grids (Chapter 5), groups (Chapter 3-4), geodesics (Chapter 4). . . . . 24
2.1 Average Geodesic Error ( \(^\circ\) ), Sanity Check, \(M \mapsto \mathcal{R}\) . . . . . 29
2.2 Geodesic Error ( \(^\circ\) ), Sanity Check, \(R \mapsto B\) . . . . . 30
2.3 Geodesic Error ( \(^\circ\) ), Pose Estimation from 3D Point Clouds. . . . . 32
2.4 Geodesic Error ( \(^\circ\) ), Inverse Kinematics. . . . . 36
2.5 Euclidean Distance (cm), Inverse Kinematics. . . . . 36
2.6 Comparison of rotation representations and convergence speed, expressed as number of iterations of the EA, for Gly-Gly and Gly-Phe dipeptides. . . . . 45
3.1 Combinations of orientational features. The column “Planes” specifies how many planes are required to build the corresponding set of features. The column “K” indicates the total number of adjacency matrices of the graph. . . . . 61
3.2 GDT_TS scores over the five datasets. . . . . 67
3.3 GDT_HA scores over the five datasets. . . . . 68
3.4 GDT_TS score for 10 example proteins (GDT_HA score in parenthesis) . . . . . 69
3.5 GDT_TS scores over the PSICOV150 dataset for different 3D projection strategies. . . . . 79

3.6 GDT_HA scores over the PSICOV150 dataset for different 3D projection strategies. . . . . 80

4.1 Number of trainable parameters for the three approaches. . . . . 91

4.2 Median translation and rotation errors over the test set for the 7 approaches. 94

4.3 Ablation study with different backbones for selected dataset. Results superior to the best PoseNet strategy are in bold. . . . . 100

4.4 Number of Parameters for Different Methods . . . . . 109

4.5 Results for the correspondence-free and correspondence-based line registration problem. . . . . 112

4.6 Relative performance of DRA compared to other methods. Positive values (in dark green) indicate a promotion (i.e. DRA’s error is lower), while negative values (in red) indicate a demotion (i.e. DRA’s error is higher). . . . . 113

5.1 The three steps of the Fengbo pipeline. . . . . 135

5.2 Training and testing errors in pressure and velocity prediction on the ShapeNet Car dataset. Fengbo results have been obtained with \(\alpha, \beta = \{5, 1\}\) for pressure and with \(\alpha, \beta = \{1, 50\}\) for velocity (see Section 5.4.6).137

5.3 Training and testing errors in pressure prediction on the Ahmed Body dataset. . . . . 137

5.4 Comparison of different models. \(d\) is the maximum degree of the graph, \(D\) is the feature space dimensionality. *: See notes on computational complexity. . . . . 140

5.5 Comparison of Fengbo’s 3D and 2D configurations. . . . . 142

5.6 Data representations and their processing steps with the Fengbo pipeline.143

5.7 Ablation on the impact of \(M\) . Reporting the relative \(L_2\) norm. . . . . 144

5.8 Ablation on the impact of \(\alpha, \beta\) . Reporting the relative \(L_2\) norm. . . . . 148

5.9 Complexity comparison for different datasets . . . . . 151

Nomenclature

Number Sets

\(\mathbb{C}\) Complex numbers
\(\mathbb{H}\) Quaternions
\(\mathbb{N}\) Natural numbers (non-negative integers)
\(\mathbb{R}\) Real numbers
\(\mathbb{Z}\) Integers (positive, negative, and zero)

Mathematical Symbols

\(\alpha\) Scaling factor
\(\beta\) Scaling factor
\(\Omega\) Dihedral angle \(\omega\) map
\(\Phi\) Dihedral angle \(\phi\) map
\(\Psi\) Dihedral angle \(\psi\) map
\(\cdot\) Inner product
\(\gamma\) Angular misalignment angle
\(\gamma_i\) STA basis vector
\(\lambda\) 1d-Up Space curvature
\(\langle \cdot \rangle_k\) \(k\) -grade extractor operator
\(\mathbb{E}\) Expected value
\(\mathbf{C}\) Contact map
\(\mathbf{D}\) Distance map
\(\mathbf{M}\) Cost map
\(\mathbf{N}\) Oriented point map
\(\mathbf{R}\) Rotation matrix
\(\mathbf{R}_\phi\) Rotation matrix rotating by angle \(\phi\)
\(\mathbf{t}\) Translation vector
\(\mathbf{v}\) or \(v(\mathbf{x})\) Optical Flow
\(\mathcal{G}_{p,q,r}\) or \(G(p, q, r)\) Geometric Algebra with \(p\) vectors that square to 1, \(q\) vectors that square to -1 and \(r\) vectors that square to 0
\(\mathcal{L}\) Loss function
\(\mathcal{N}\) Gaussian distribution
\(\mathcal{U}\) Uniform distribution
\(\nabla\) Gradient operator
\(\phi\) Rotation angle between predicted and ground truth vectors
\(\sigma\) Sigmoid function
\(\theta\) Learned network parameters
\(\sim\) Reverse operator
\(\wedge\) Outer product
\(e_i\) GA basis vector
\(I_n\) or \(I\) Pseudoscalar
\(M\) Motor
\(R\) Rotor
\(T\) Translation rotor

Acronyms / Abbreviations

Adam Adaptive Moment Estimation
BN Batch Normalization
CGA Conformal Geometric Algebra
CGENN Clifford Group Equivariant Neural Network
CLIP Contrastive Language-Image Pretraining
CNN Convolutional Neural Network
DDPM Denoising Diffusion Probabilistic Model
DDVM Denoising Diffusion Vision Model
DE Differential Evolution
DL Deep Learning
DoF Degrees of Freedom
DRA Define, Refine, Align
EA Evolutionary Algorithm
FNO Fourier Neural Operator
GA Geometric Algebra
GA-DDVM Geometric Algebra Denoising Diffusion Vision Model
GCAN Geometric Clifford Algebra Network
GDT_HA Global Distance Test - High Accuracy
GDT_TS Global Distance Test - Total Score
GELU Gaussian Error Linear Unit
GNO Graph Neural Operator
GrT Graph Transformer
GT Ground Truth

LN Layer Normalization
LSTM Long Short-Term Memory
MAE Mean Absolute Error
ML Machine Learning
MLP Multilayer Perceptron
MoCap Motion Capture Database
MSE Mean Squared Error
MVL Multivector Linear Layer
NMR Nuclear Magnetic Resonance
NN Neural Network
NO Neural Operator
OFE Optical Flow Estimation
P-FCGP Pairwise Fully Connected Geometric Product Layer
PDB Protein Database
PDE Partial Differential Equation
PSP Protein Structure Prediction
ReLU Rectified Linear Unit
SOTA State of the Art
SSIM Structural Similarity Index Measure
STA Spacetime Algebra
SVD Singular Value Decomposition
T-FCGP Triplewise Fully Connected Geometric Product Layer
ViT Vision Transformer
VLM Vision-Language Model

Chapter 1

Introduction and Background

I thought I had something so simple
to say. Something useful to
everybody.


— Marcello Mastroianni as Guido,
\(8\frac{1}{2}\) , dir. Federico Fellini (1963)

1.1 Introduction

Geometric Algebra (GA) is a mathematical framework based on the algebra of William Kingdon Clifford. GA is a powerful tool as it offers an intuitive manipulation of geometrical objects (scalars, vectors, oriented surfaces and volumes, etc.) and a simple way to perform geometric transformations by any amount and in any dimension.

Historically, GA has been more successful in some fields than in others. Researchers and engineers in theoretical physics or computer graphics, for example, have been making use of GA for several years. Experiments in GA for Machine Learning (ML) and Deep Learning (DL), on the other hand, have only started to appear in the literature in the last couple of years, despite early notions of a GA-based ML being proposed as early as the nineties. This is surprising, since many ML problems deal with intrinsically geometric data, and such problems are often solved with engineering-intense approaches, which, albeit effective, are very complex and not at all intuitive.

The goal of this dissertation is to bridge the gap between the GA and the ML worlds, and to show how GA can improve on and simplify different aspects of a typical ML pipeline. These aspects include, among others, representing geometrical objects as data in an intuitive, meaningful way; understanding intermediate outputs of a neural

network and boosting its accuracy at a reduced computational cost. These challenges are recurrent problems in the broader Deep Learning community that can be addressed elegantly and effectively using GA.

While it could be argued that GA is merely one alternative among several, more widely recognized approaches in the literature, the versatility of GA is undeniable. This thesis aims to demonstrate that with a few simple GA tools, a wide variety of problems, ranging from bioinformatics to computer vision and engineering, can be tackled using the same methods and the same underlying philosophy.

At the outset of the PhD project, the goal was to show how GA could simplify the handling of proteins. However, it quickly became clear that the same methods applicable to proteins could be extended to a broad range of other geometric objects. This insight revealed that GA could serve as a valuable tool not only in bioinformatics, but also within the larger field of geometric deep learning.

This thesis constitutes a summary of works published starting from September 2021.

  • The first half of Chapter 2 has been published as:

    • Pepe, Alberto, Joan Lasenby, and Pablo Chacón. "Learning rotations." Mathematical Methods in the Applied Sciences 47.3 (2024): 1204–1217 [143].
    • The second half is a joint work with Scuola Normale Superiore in Pisa, Italy, and has been presented as: Pepe, Alberto, Federico Lazzari, Vincenzo Barone, Joan Lasenby. "Molecular Geometry Optimization through Rotor-based Evolutionary Algorithm", 13th International Conference on Clifford Algebras and Their Applications in Mathematical Physics (ICCA13), Holon, Israel (2023).
  • Elements of Chapter 3 have been published as:

    • Pepe, Alberto, Joan Lasenby, and Pablo Chacon. "Geometric Algebra Models of Proteins for Three-Dimensional Structure Prediction: A Detailed Analysis." International Conference on Advanced Computational Applications of Geometric Algebra (ICACGA). Cham: Springer Nature Switzerland, 2022 [143, 144].
    • Pepe, Alberto, and Joan Lasenby. "Modeling orientational features via Geometric Algebra for 3D protein coordinates prediction." Mathematical Methods in the Applied Sciences 47.14 (2024): 11385–11404 [141].
  • Pepe, Alberto, Joan Lasenby, and Pablo Chacon. "Using a graph transformer network to predict 3D coordinates of proteins via Geometric Algebra modelling." International Workshop on Empowering Novel Geometric Algebra for Graphics and Engineering. Cham: Springer Nature Switzerland, 2022 [145].

  • Pepe, Alberto, Sven Buchholz, and Joan Lasenby. "Clifford Group Equivariant Neural Network Layers for Protein Structure Prediction." Northern Lights Deep Learning Conference. PMLR, 2024 [137].

  • Chapter 4 has been published as:

    • Pepe, Alberto, and Joan Lasenby. "CGAPoseNet: Camera pose regression via a 1D-up approach to Conformal Geometric Algebra." arXiv preprint arXiv:2302.05211 (2023) [140].
    • Pepe, Alberto, Joan Lasenby, and Sven Buchholz. "CGAPoseNet+ GCAN: A Geometric Clifford Algebra Network for Geometry-aware Camera Pose Regression." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024 [142].
    • Pepe, Alberto, Yuxin Yao, and Joan Lasenby. "Define, Refine, Align: Correspondence-free 3D Line Alignment with Attentional, Equivariant and Rotational Layers." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Perception Beyond the Visible Spectrum (PBVS), 2025.
  • Elements of Chapter 5 have been published as:

    • Pepe, Alberto, Sven Buchholz, and Joan Lasenby. "GA-ReLU: an activation function for Geometric Algebra Networks applied to 2D Navier–Stokes PDEs." ICLR 2024 Workshop on AI4DifferentialEquations In Science, 2024 [138].
    • Pepe, Alberto, Sven Buchholz, and Joan Lasenby. "STAResNet: a Network in Spacetime Algebra to solve Maxwell's PDEs." arXiv preprint arXiv:2408.13619 (2024) [139].
    • Pepe, Alberto, Mattia Montanari, and Joan Lasenby. "Fengbo: A Clifford Neural Operator pipeline for 3D PDEs in Computational Fluid Dynamics." ICLR 2025.

1.2 Fundamentals of GA

The branch of mathematics called Geometric Algebra (GA) was developed in the second half of the 19th century by William Kingdon Clifford [33]. GA was revived in 1966 thanks to David Hestenes, who reiterated the unifying power of GA and its relevance in physics due to the close ties with geometry [68]. GA provides an intuitive and unified framework for describing geometric entities and transformations acting on them, and it has found application in computer graphics, computer vision, robotics, physics and more. In the rest of this section we will introduce the fundamental operators of GA and the subalgebras that we employed throughout the thesis. For a more rigorous derivation we refer the reader to [103, 43]. All plots in this section have been generated using the Clifford package [61].

1.2.1 Defining a subalgebra

A Geometric Algebra of size \(m\) can be defined over a scalar field and a set of \(m\) independent, orthonormal basis vectors \(\{e_i\}_{i=1,\dots,m}\) . We indicate a generic closed subalgebra with \(\mathcal{G}_{p,q,r}\) or alternatively \(G(p,q,r)\) . A closed subalgebra \(\mathcal{G}_{p,q,r}\) has \(p\) basis vectors that square to 1, \(q\) basis vectors that square to -1 and \(r\) basis vectors that square to 0.

1.2.2 The geometric product

Elements in a GA are called multivectors. Elements of any type can be added or multiplied together. Each element has a grade associated with it. By grade we define the dimension of the hyperplane an object specifies. e.g. scalars are grade 0, vectors are grade 1, bivectors are grade 2, etc. Geometric algebra gets its name from the geometric product. The geometric product between two vectors is given by

\[ab = a \cdot b + a \wedge b \quad (1.1)\]

in which the scalar (or inner) product \(a \cdot b\) is the usual scalar product of linear algebra, which equal to the cosine of the angle between \(a\) and \(b\) when \(a, b\) are unit vectors, while the wedge (or outer) product \(a \wedge b\) produces a bivector (e.g. an oriented plane). The geometric product is hence the sum of a scalar and a bivector, that have different grade. Any multivector of a unique grade \(r\) that can be defined as

\[A = a_1 \wedge a_2 \wedge \dots \wedge a_r \quad (1.2)\]

is called a blade.

The reversion operator for a multivector is given by \(\tilde{\cdot}\) . The reverse of a scalar is equal to the scalar itself and the reverse of a vector is equal to the vector itself. For a multivector, we have that

\[\begin{aligned}(AB)^{\tilde{\cdot}} &= \tilde{B}\tilde{A} \\ (A + B)^{\tilde{\cdot}} &= \tilde{A} + \tilde{B}\end{aligned}\quad (1.3)\]

The general rule to reverse an \(r\) -blade is given by

\[\tilde{A}_r = (-1)^{\frac{r(r+1)}{2}} A_r \quad (1.4)\]

The grade projector operator is denoted by \(\langle A \rangle_r\) , where \(r\) is the grade we want to extract from \(A\) . This comes from the fact that a multivector in a \(m\) dimensional algebra can be written as

\[A = \sum_{i=0}^n \langle A \rangle_i \quad (1.5)\]

The geometric product between a multivector and its reverse gives the squared magnitude: \(|A|^2 = \langle A\tilde{A} \rangle_0\) . The reversion operator can be used to define the inverse of a multivector as

\[A^{-1} = \frac{\tilde{A}}{|A|^2} \quad (1.6)\]

It can be easily shown that \(A^{-1}A = 1\) . The dual of a multivector is defined as

\[A^* = AI_n^{-1} \quad (1.7)\]

where \(I_n\) is called the pseudoscalar, defined as \(I_n = e_1 \wedge e_2 \wedge \dots \wedge e_n\) . The pseudoscalar is the highest grade element in a GA. The product of a grade- \(n\) pseudoscalar \(I_n\) and a grade- \(r\) multivector \(A_r\) is a grade- \((n-r)\) multivector, and is called a duality transformation. The pseudoscalar interchanges inner and outer products:

\[A_r \cdot (B_s I) = \langle A_r B_s I \rangle_{r-(n-s)} = \langle A_r B_s I \rangle_{n-(r+s)} = \langle A_r B_s \rangle_{r+s} I = A_r \wedge B_s I \quad (1.8)\]

1.2.3 Geometric Algebra of the plane and of the space

The GA of the Euclidean plane is denoted by \(\mathcal{G}_{2,0,0}\) , with two basis vectors \(e_1, e_2\) . Being an \(n = 2\) dimensional GA (since \(p + q + r = 2 + 0 + 0 = 2 = n\) ), it is spanned by \(2^2 = 4\) elements, namely a scalar, two vectors \(e_1, e_2\) and the bivector \(e_1 e_2 = e_1 \wedge e_2\) .

02f9c911b69504d90bd20e0bc61c4bbb_img.jpg

The diagram illustrates the elements of the Geometric Algebra (GA) of Euclidean space. It starts with a scalar (represented by a dot), followed by three vectors: \(e_1\) (vertical), \(e_2\) (horizontal), and \(e_3\) (diagonal). These vectors combine to form three bivectors: \(e_{12}\) (a square in the \(e_1\) - \(e_2\) plane), \(e_{23}\) (a square in the \(e_2\) - \(e_3\) plane), and \(e_{13}\) (a square in the \(e_1\) - \(e_3\) plane). Finally, the three vectors combine to form a trivector \(e_{123}\) (a cube).

Figure 1.1: Elements of the Geometric Algebra (GA) of Euclidean space. The diagram shows the progression from a scalar (a dot) to three vectors (e1, e2, e3) to three bivectors (e12, e23, e13) and finally to a trivector (e123).

Fig. 1.1 Elements of \(\mathcal{G}_{3,0,0}\) , the GA of Euclidean space: one scalar, three vectors \(e_1, e_2, e_3\) , three bivectors \(e_{12}, e_{23}, e_{13}\) and one trivector \(e_{123}\) .

\(\mathcal{G}_{2,0,0}\) includes the concept of complex numbers, since the pseudoscalar of this algebra \(I = e_1 \wedge e_2 = e_1 e_2 \equiv e_{12}\) squares to \(-1\) , since \(I^2 = e_{12}^2 = e_{12} e_{12} = (e_1 e_2)(e_1 e_2) = -(e_1 e_1)(e_2 e_2) = -1\) . A scalar plus a bivector can be seen as a representation of a complex number, since \(Z = a + Ib \equiv a + \iota b\) , where \(\iota\) is the imaginary unit. Similarly, if \(X = ae_1 + be_2 = e_1(a + bI) = e_1 Z\) .

Adding a third basis vector \(e_3\) we form \(\mathcal{G}_{3,0,0}\) , the GA of Euclidean space (see Fig. 1.1). It has \(2^3 = 8\) elements, a scalar, three vectors, three bivectors ( \(e_{12}, e_{23}, e_{13}\) ) and one trivector ( \(e_{123} = e_1 \wedge e_2 \wedge e_3 = I_3\) , the pseudoscalar). The GA of space includes the quaternion algebra, since a quaternion \(\mathbf{q} = w + a\mathbf{i} + b\mathbf{j} + c\mathbf{k}\) can be represented as a multivector \(A = w + ae_{12} + be_{13} + ce_{23}\) .

Throughout this thesis, the GA notation will be employed for the majority of mathematical quantities and geometric objects, unless explicitly stated otherwise. For example, lowercase, non-bold letters will be used to represent vectors, uppercase letters will denote bivectors and so on.

1.2.4 Geometric transformations

The concepts below assume a three-dimensional GA, but can be extended to any GA of arbitrary dimension.

Reflections

The reflection of a vector \(a\) in the plane orthogonal to the unit vector \(n\) is given by \(a' = a_{\perp} - a_{\parallel}\) . Hence, we first resolve \(a\) into its parallel and perpendicular components to \(n\) . This is done as \(a = n^2 a = n(n \cdot a + n \wedge a) = a_{\parallel} + a_{\perp}\) , in which \(a_{\parallel} = (a \cdot n)n\) and \(a_{\perp} = n(n \wedge a)\) . This is equivalent to

\[a' = n(n \wedge a) - n(a \cdot n) = -n(a \cdot n + a \wedge n) = -nan \quad (1.9)\]

9612f08d343bcc2a11b174db64ba7b3a_img.jpg

A 3D diagram illustrating reflection in Geometric Algebra. A magenta plane labeled Π is shown. A cyan plane labeled Π' is its reflection. A unit normal vector n is shown as a black arrow pointing from the magenta plane towards the cyan plane. The reflection is performed across a plane with unit normal vector n.

Fig. 1.2 Reflection in Geometric Algebra: a multivector \(\Pi\) (e.g., a plane represented as a bivector) is reflected across a plane with unit normal vector \(n\) using the formula \(\Pi' = -n\Pi n\) . This reflection preserves the grade of \(\Pi\) .

The formula \(a' = -nan\) works for spaces in any dimension and it leaves lengths and angles unchanged. For bivectors \(B = a \wedge b\) , we have that

\[B' = (-nan) \wedge (-nbn) = \frac{1}{2}(nannbn - nbnnan) = \frac{1}{2}n(ab - ba)n = nBn \quad (1.10)\]

Sandwiching a multivector between a vector always preserves the grade of the multivector (see Fig. 1.2). The dual vector \(IB\) is subject to the same transformation \(I(nBn) = n(IB)n\) .

Rotations

A rotation in the plane is generated by successive reflections in the planes perpendicular to two unit vectors \(m\) and \(n\) . Let's consider two reflections: \(b = -mam\) , and then \(c = -nbn = -n(-mam)n = nmamn\) . We define \(R = nm\) , so to write \(c = Ra\tilde{R}\) , representing a rotation. \(R\) is called a rotor. A rotor is the geometric product of two unit vectors (see Fig. 1.3). We can rewrite it as

\[R = nm = n \cdot m + n \wedge m = \cos(\theta) + n \wedge m \quad (1.11)\]

419237decd8294588ca7ca2aa84b0e75_img.jpg

Diagram illustrating rotation in Geometric Algebra. A multivector Π (red plane) is rotated to Π'' (green plane) by applying two successive reflections across planes with unit normals n and m. The rotation is given by Π'' = mnΠnm = RΠR̃, where R = mn is the rotor. The diagram shows the planes Π, Π', and Π'' in a 3D coordinate system, with the rotor R indicated by a curved arrow.

Fig. 1.3 Rotation in Geometric Algebra: a multivector \(\Pi\) is rotated by applying two successive reflections across planes with unit normals \(n\) and \(m\) . The rotation is given by \(\Pi'' = mn\Pi nm = R\Pi\tilde{R}\) , and preserves the grade of \(\Pi\) . The geometric product \(R = mn\) is called a rotor.

The magnitude of the bivector part is given by \(\sin^2(\theta)\) . We define the unit bivector \(B\) in the \(m \wedge n\) plane as

\[B = \frac{m \wedge n}{\sin(\theta)} \quad (1.12)\]

with \(B^2 = -1\) .

Recalling the complex number duality, the rotor can be written as \(R = \cos(\theta) - B \sin(\theta)\) , or similarly \(R = \exp(-B\theta)\) , in which the exponential function is defined by its Taylor expansion where the multiplications are replaced by the geometric product. For a rotation of an angle \(\theta\) , we have \(R = \exp(-B\theta/2)\) . Summarizing, the rotation operation:

\[a' = Ra\tilde{R} = e^{-B\theta/2}ae^{B\theta/2} \quad (1.13)\]

where \(R\tilde{R} = 1\) . Rotations preserve lengths and angles:

\[a' \cdot b' = \frac{1}{2}(Ra\tilde{R}Rb\tilde{R} + Rb\tilde{R}Ra\tilde{R}) = \frac{1}{2}R(ab + ba)\tilde{R} = a \cdot b R\tilde{R} = a \cdot b \quad (1.14)\]

The reverse of the rotation is given by \(a = \tilde{R}a'R\) .

The single \(R\) acting by itself on a vector has no geometric significance: the formula \(b = Ra\) only works for vectors in the plane of rotation. For this reason, we say that GA has a double-cover representation of the rotation group: since it is double sided, \(R\) and

f3ce2d7158eb708e3487b8e35415db35_img.jpg

A 3D visualization showing a 90-degree rotation of geometric objects. A blue vertical plane and a blue circle are shown, along with an orange horizontal plane and an orange circle. A blue vector points upwards, and an orange vector points to the left. Two points, one blue and one orange, are also shown. The rotation is indicated by a curved arrow around the vertical axis.

Fig. 1.4 Visualization of some \(90^\circ\) rotations in 3D Geometric Algebra using sandwich products. The orange objects—vector, circle, plane, and point—are rotated to the blue counterparts by the sandwich product \(X \mapsto RX\tilde{R}\) , where \(\tilde{R}\) is the reverse of \(R\) . Rotation via the sandwich product is independent of the type of geometric object \(X\) , applying uniformly to vectors, bivectors, and points alike.

\(-R\) yield the same rotation. Multivectors can be rotated with the same principle (see Fig. 1.4). For the group combination rule, if \(b = R_1 a \tilde{R}_1\) , \(c = R_2 b \tilde{R}_2\) , then \(c = R a \tilde{R}\) , where \(R = R_1 R_2\) .

In the rest of this thesis we will represent rotations in the Euclidean space in terms of rotors and bivectors, and employing two kinds of rotor-to-bivector maps, namely (i) the exponential map, for which

\[B = -2 \log R \quad (1.15)\]

and (ii) the Cayley transform [68], for which

\[B = \frac{1 - R}{1 + R} \quad (1.16)\]

1.2.5 Conformal Geometric Algebra (CGA)

Let \(x\) be a vector in a Geometric Algebra \(\mathcal{G}_{p,q,r}\) of dimension \(n = p + q + r\) . CGA extends the Geometric Algebra to \(\mathcal{G}_{p+1,q+1,r}\) by introducing two basis vectors, \(e\) and \(\bar{e}\) , with \(e^2 = +1\) and \(\bar{e}^2 = -1\) . Since \(e_i \cdot e = e_i \cdot \bar{e} = 0\) , then also \(x \cdot e = x \cdot \bar{e} = 0\) . Having introduced \(e\) and \(\bar{e}\) , we can create the null vectors following Hestenes' notation:

\[\begin{aligned} n &= e + \bar{e} \\ \bar{n} &= e - \bar{e} \end{aligned} \quad (1.17)\]

An alternative notation, more common today, is given by

\[\begin{aligned} n_\infty &= e + \bar{e} \\ n_0 &= \frac{1}{2}(\bar{e} - e) \end{aligned} \quad (1.18)\]

which represent the point at infinity and the origin, respectively. They are null since \(n_\infty^2 = (e + \bar{e})(e + \bar{e}) = e^2 + 2e \cdot \bar{e} + \bar{e}^2 = 1 + 0 - 1 = 0\) . Moreover, we can show that \(n_\infty \cdot n_0 = e^2 - \bar{e}^2 = 2\) and \(x \cdot n_\infty = x \cdot n_0 = 0\) . CGA defines a mapping of the kind

\[x \in \mathcal{G}_{p,q,r} \longrightarrow F(x) \in \mathcal{G}_{p+1,q+1,r} \quad (1.19)\]

in which \(F(x)\) is defined, using Hestenes' notation, as

\[F(x) = x^2 n + 2x - \bar{n} \quad (1.20)\]

or, alternatively, as

\[\begin{aligned} F(x) &= -\frac{1}{2}(x - e)n_\infty(x - e) \\ F(x) &= \frac{1}{2}(x^2 n_\infty + 2x - n_0) \end{aligned} \quad (1.21)\]

The factor of \(1/2\) is chosen so that \(F(x) \cdot n_\infty = -1\) . \(F(x)\) is always a null vector for any \(x\) . A single point \(x \in \mathcal{G}_{p,q,r}\) corresponds to a family of null vectors \(X = \lambda(x^2 n_\infty + 2x - n_0) \in \mathcal{G}_{p+1,q+1,r}\) . If \(x = 0\) , then \(F(x) = -\frac{1}{2}n_0\) , i.e. the origin. Its inverse is \(n_\infty\) , i.e. mapping points at infinity. If we are dealing with a 3D space, the equivalent CGA will be \(\mathcal{G}_{4,1,0}\) , i.e. a 5D space.

1.2.6 Transformations in CGA

Rotations

Recall that in \(\mathcal{G}_{p,q,r}\) a rotation is expressed by \(x \rightarrow Rx\tilde{R}\) . It can be shown that to rotate a conformal object \(F(x) \rightarrow RF(x)\tilde{R}\) holds:

\[RF(x)\tilde{R} = \frac{1}{2}R(x^2n_\infty + 2x - n_0)\tilde{R} = \frac{1}{2}[x^2Rn_\infty\tilde{R} + 2Rx\tilde{R} - Rn_0\tilde{R}] \quad (1.22)\]

A rotor contains only even blades, so it commutes with \(n_\infty\) . Moreover, \(Rn_0\tilde{R} = n_0\) . We hence have:

\[RF(x)\tilde{R} = \frac{1}{2}(\hat{x}^2n_\infty + 2\hat{x} - n_0) \quad (1.23)\]

where \(\hat{x} = Rx\tilde{R}\) .

Inversions

By inversion we mean the mapping \(x \rightarrow x/x^2\) . It can be shown that reflecting \(F(x)\) in \(e\) gives:

\[-eF(x)e = -\frac{1}{2}[x^2en_\infty e + 2exe - en_0e] = \frac{1}{2}x^2[\frac{1}{x^2}n_\infty + 2\frac{x}{x^2} - n_0] = x^2F\left(\frac{x}{x^2}\right) \quad (1.24)\]

The scale and the sign are irrelevant, so we can safely discard the minus sign and define the inversion as \(e(\cdot)e\) (see Fig. 1.5).

1c2028183a35357e7238438a4af9cab7_img.jpg

The figure shows a 2D Cartesian coordinate system with a grid. The x-axis and y-axis both range from -1.0 to 3.0 with major grid lines every 0.5 units. A blue vector, labeled \(a = e_1 + 2e_2\) , originates at (0,0) and terminates at (1,2). A red vector, labeled \(F(a_{inv}) = \frac{a}{a^2} = eF(a)e\) , originates at (0,0) and terminates at (0.2, 0.4). The red vector is the inversion of the blue vector.

A 2D coordinate system showing a blue vector 'a = e1 + 2e2' starting at the origin (0,0) and pointing to the point (1,2). A red vector, labeled 'F(a_inv) = a/a^2 = eF(a)e', starts at the origin and points to the point (0.2, 0.4), which is the inversion of vector 'a'.

Fig. 1.5 Inversion in Conformal Geometric Algebra: Euclidean vector \(a\) (blue) is transformed via CGA using sandwich products. Its inversion (red) is given by \(eF(a)e\) .

Translations

A translation takes place if \(x \rightarrow x + a\) . We consider the rotor \(R = T_a = \exp(na/2)\) , and use the Taylor expansion on the exponential. Since \(n\) is a null vector and \(an = -na\) , all higher terms are zero, we can rewrite it as:

\[R = T_a = \exp\left(\frac{na}{2}\right) = 1 + \frac{na}{2} + \frac{1}{2}\left(\frac{na}{2}\right)^2 + \dots \simeq 1 + \frac{na}{2} \quad (1.25)\]

Some useful properties:

  • \(Rn_\infty\tilde{R} = n_\infty\) since \(n_\infty\) represents a point at infinity
  • \(Rn_0\tilde{R} = n_0 - 2a - a^2n_\infty \rightarrow R(-n_0)\tilde{R} = F(a)\) since \(-n_0\) represents the origin
  • \(Rx\tilde{R} = x + n_\infty(a \cdot x)\) by relation of dot and geometric product.

The above properties allow us to show that

\[RF(x)\tilde{R} = \frac{1}{2}((x+a)^2 + 2(x+a) - n_0) = F(x+a) \quad (1.26)\]

An example is shown in Fig. 1.6.

db267ff9c1b97bbae0cb0856be1d8734_img.jpg

A 3D diagram illustrating a translation in Conformal Geometric Algebra. A pink plane labeled Π is shown at the bottom, and a green plane labeled Π' is shown above it. An arrow labeled T_a points from the pink plane to the green plane, representing the translation vector. The planes are shown within a 3D coordinate system with gray axes.

Fig. 1.6 Translations in Conformal Geometric Algebra: the green plane is the result of translating the original pink plane by the Euclidean vector \(a\) . The translation is performed via the sandwich product \(\Pi' = T_a \Pi \tilde{T}_a\) , with \(T_a = 1 - \frac{1}{2}a\bar{e}\) , \(\tilde{T}_a = 1 + \frac{1}{2}a\bar{e}\) .

Dilations

A dilation in GA is given by \(x \rightarrow \alpha x\) . We consider the rotor \(R = D_\alpha = \exp(\frac{\alpha}{2}e\bar{e})\) . We have that

\[\begin{aligned} D_\alpha F(x) \tilde{D}_\alpha &= \exp\left(\frac{\alpha}{2}e\bar{e}\right) \frac{1}{2}(x^2 n_\infty + 2x - n_0) \exp\left(-\frac{\alpha}{2}e\bar{e}\right) = \\ &= \frac{1}{2}(x^2 \exp(\alpha e\bar{e}) n_\infty + 2x - \exp(\alpha e\bar{e}) n_0) = \\ &= \exp(\alpha) \frac{1}{2}(\hat{x}^2 n_\infty + 2\hat{x} - n_0) \end{aligned} \quad (1.27)\]

with \(\hat{x} = \exp(-\alpha)x\) . Dilations in GA are represented by a rotor \(R = D_\alpha\) in conformal space (see Fig. 1.7). A summary of the transformations in CGA is given in Table 1.1.

Table 1.1 Summary of common geometric transformations in GA and CGA

Transformation GA CGA
Rotation \(x \rightarrow Rx\tilde{R}\) \(F(x) \rightarrow F(Rx\tilde{R})\)
Inversion \(x \rightarrow \frac{x}{x^2}\) \(F(x) \rightarrow -\frac{eF(x)e}{x^2}\)
Translation \(x \rightarrow x + a\) \(F(x) \rightarrow T_a F(x) \tilde{T}_a\)
Dilation \(x \rightarrow \alpha x\) \(F(x) \rightarrow D_\alpha F(x) \tilde{D}_\alpha\)

fef13e705ab28b357c22ed6444dde1a2_img.jpg

A 3D visualization showing a dilation in Conformal Geometric Algebra. A small yellow sphere labeled Σ is centered on a light gray plane. A larger teal sphere labeled Σ' is also centered on the same plane, representing the result of dilating Σ by a scale factor α = 6. The dilation is performed via the sandwich product Σ' = D_α Σ D̃_α, with D_α = exp(α/2 e ē).

Fig. 1.7 Dilations in Conformal Geometric Algebra: The teal sphere \(\Sigma'\) is the result of dilating the original yellow sphere \(\Sigma\) by the scale factor \(\alpha = 6\) . The dilation is performed via the sandwich product \(\Sigma' = D_\alpha \Sigma \tilde{D}_\alpha\) , with \(D_\alpha = \exp(\frac{\alpha}{2}e\bar{e})\) .

1.2.7 Objects and incidence

Point pairs, lines, planes, circles and spheres are all represented by blades in the 5D CGA. They can be transformed with rotors. The dual of these objects are used to extract meaningful features of these objects. The simplest object in CGA is represented by a blade of the kind \(A \wedge B\) , that represents a point pair.

Lines

If a line \(L\) passes through two points \(a, b\) with CGA representation \(A\) and \(B\) , then the line can be represented by \(L = A \wedge B \wedge n_\infty\) . The representation \(X\) of any point on the line will satisfy \(X \wedge L = 0\) . This is an incidence relation, i.e. \(X \wedge Y \wedge \dots \wedge Z = 0\) , and it is invariant under rotor/reflection operations. Let's consider two points \(x_1, x_2\) on a line \(L\) with corresponding representation \(X_1, X_2\) in \(\mathcal{G}(p+1, q+1, r)\) . Without loss of generality we assume that the line passes through the origin in the direction of \(e_1\) . Hence \(X_i\) only contains the vectors \(n_\infty, n_0, e_1\) , as any point on the line must have form \(x = \lambda e_1\) . If \(X\) is any point on the line, then the incidence relation is given by

\[X \wedge X_1 \wedge X_2 \wedge n_\infty = X \wedge L = 0 \quad (1.28)\]

Planes

Without loss of generality, consider the plane spanned by \(e_1, e_2\) . A point \(x\) lies on the plane if \(x = \lambda e_1 + \mu e_2\) . Its conformal representation will only contain \(n_\infty, n_0, e_1, e_2\) as \(X = \frac{1}{2}(x^2 n_\infty + 2(\lambda e_1 + \mu e_2) - n_0)\) . Let \(\Pi = X_1 \wedge X_2 \wedge X_3 \wedge n_\infty = \beta(n_\infty \wedge n_0 \wedge e_1 \wedge e_2)\) , so if a point \(X\) lie on the plane we have that

\[X \wedge X_1 \wedge X_2 \wedge X_3 \wedge n_\infty = X \wedge \Pi = 0 \quad (1.29)\]

Circles

Consider \(\mathcal{G}_{2,0,0}\) i.e. the usual Euclidean plane with basis \(e_1, e_2, e_1^2 = 1, e_2^2 = 1\) . Let \(L\) be the line \(x = 1\) and \(a\) a point with coordinates \((x, y)\) . Assume we want to invert the points on this line: this gives the set \(L \rightarrow \left(\frac{1}{1+y^2}, \frac{y}{1+y^2}\right) = (x', y')\) . It can be shown that \([x' - 1/2]^2 + y'^2 = (1/2)^2\) is the equation of a circle centred at \((1/2, 0)\) and radius \(1/2\) . We mapped a line into a circle. If \(X\) is a point on a line, then the inverted point is on the circle is \(X'\) . Since the incidence relation is invariant to inversion, then a point \(X'\) belongs on a circle if

\[\begin{aligned} X' \wedge X'_1 \wedge X'_2 \wedge X'_3 &= e(X \wedge X_1 \wedge X_2 \wedge X_3)e = eXe \wedge eX_1e \wedge eX_2e \wedge eX_3e = \\ &= X'_1 \wedge \mathcal{C} = 0 \end{aligned} \quad (1.30)\]

We know that a line can be represented as a trivector of the form \(L = X_1 \wedge X_2 \wedge n_\infty\) , then any circle as \(\mathcal{C} = X_1 \wedge X_2 \wedge X_3\) , where no \(X_i\) is a multiple of \(n_\infty\) : a line is a circle passing through a point at infinity. Examples of lines and circles in CGA are shown in Fig. 1.8.

640118899584f24b3d6373431a6c7722_img.jpg

A 3D perspective view of a horizontal plane. A straight line, labeled \(L = X_1 \wedge X_2 \wedge n_\infty\) , passes through the plane. A circle, labeled \(C = X_1 \wedge X_2 \wedge X_3\) , is also shown on the plane, appearing as a blue ellipse due to the perspective. The line and the circle intersect at two points.

Figure 1.8: A 3D diagram showing a line L and a circle C on a plane. The line L is labeled L = X_1 \wedge X_2 \wedge n_\infty and the circle C is labeled C = X_1 \wedge X_2 \wedge X_3.

Fig. 1.8 Lines and circles in Conformal Geometric Algebra are trivectors of the type \(X_1 \wedge X_2 \wedge X_3\) . A line can be interpreted as a circle with infinite radius, i.e. containing the point at infinity \(X_3 = n_\infty\) .

Spheres

Given any 4 points with 5D representation \(X_1, X_2, X_3, X_4\) the sphere through those points is given by the 4-vector \(\Sigma = X_1 \wedge X_2 \wedge X_3 \wedge X_4\) . If \(X\) lies on \(\Sigma\) , then \(X \wedge \Sigma = 0\) , that is equivalent to \(X \cdot (\Sigma I) = 0 \rightarrow X \cdot \Sigma^* = 0\) , where \(\Sigma^*\) is the dual of \(\Sigma\) and hence is a vector. Again, since a plane can be represented as a 4-vector of the form \(\Pi = X_1 \wedge X_2 \wedge X_3 \wedge n_\infty\) , it can be seen as a sphere passing through a point at infinity. Examples of planes and spheres in CGA are shown in Fig. 1.9. A summary of CGA objects as blades is given in Table 1.2.

967e08f00a4fffdc167c53c3bad53c84_img.jpg

A 3D perspective view showing a sphere, labeled \(\Sigma = X_1 \wedge X_2 \wedge X_3 \wedge X_4\) , which is a semi-transparent red sphere. Below it is a horizontal plane, labeled \(\Pi = X_1 \wedge X_2 \wedge X_3 \wedge n_\infty\) , which is a semi-transparent grey plane. The sphere appears to be resting on or intersecting the plane.

Figure 1.9: A 3D diagram showing a sphere Sigma and a plane Pi. The sphere Sigma is labeled Sigma = X_1 \wedge X_2 \wedge X_3 \wedge X_4 and the plane Pi is labeled Pi = X_1 \wedge X_2 \wedge X_3 \wedge n_\infty.

Fig. 1.9 Planes and spheres in Conformal Geometric Algebra are quadrivectors of the type \(X_1 \wedge X_2 \wedge X_3 \wedge X_4\) . A plane can be interpreted as a sphere with infinite radius, i.e. containing the point at infinity \(X_4 = n_\infty\) .

Table 1.2 Objects in CGA

Grade Symbol Meaning
1 \(A\) point
2 \(A \wedge B\) point pair
3 \(A \wedge B \wedge C\) circle ( \(\mathcal{C}\) )
3 \(A \wedge B \wedge n_\infty\) line ( \(L\) )
4 \(A \wedge B \wedge C \wedge D\) sphere ( \(\Sigma\) )
4 \(A \wedge B \wedge C \wedge n_\infty\) plane ( \(\Pi\) )

Intersections

The intersection of any two objects \(W_r, W_s\) is equivalent to taking the dual of the \(2n - r - s\) grade part of the product of \(W_r\) and \(W_s\) . For this purpose, we can define the meet operator as

\[W_r \vee W_s = (W_r^* \wedge W_s^*)^* \quad (1.31)\]

Intuitively, the intersection of a sphere and a circle results in a pair of points, while the intersection of a sphere and a plane yields a circle, provided that the join of the two entities encompasses the entire space. Examples of the use of the meet operator are given in Fig. 1.10-1.11

fb15f15a7d964c8490c9317e43e40c1a_img.jpg

The figure consists of three separate 3D perspective drawings. The left drawing shows two spheres, one light blue and one light green, overlapping. Their intersection is highlighted as a vertical circle. Above it is the label \(\Sigma_1 \vee \Sigma_2 = \mathcal{C}_1\) . The middle drawing shows a light green sphere resting on a horizontal yellow plane. The intersection is a circle on the sphere's surface. Above it is the label \(\Sigma_1 \vee \Pi = \mathcal{C}_2\) . The right drawing shows two planes, one light blue and one light green, intersecting along a single vertical line. Below the drawing is the label \(\Pi_1 \vee \Pi_2 = L\) .

Figure 1.10: Three diagrams illustrating the meet operator between two grade-4 objects. The first diagram shows two spheres, Σ1 and Σ2, intersecting to form a circle C1. The second diagram shows a sphere Σ1 and a plane Π intersecting to form a circle C2. The third diagram shows two planes, Π1 and Π2, intersecting to form a line L.

Fig. 1.10 The meet operator between two grade-4 objects (two spheres, two planes, or a sphere and a plane), yields a grade-3 object (circles or lines) unless degenerate.

be0e6826ff90caab1b4da9ec86a23e96_img.jpg

The figure consists of three separate diagrams, each illustrating a geometric meet operation in 3D space:

  • Left diagram:** A sphere (grade-4 object) and a circle (grade-3 object) intersect. The circle passes through the sphere, and their intersection is a point pair, labeled A and B. Above the diagram is the equation \(\Sigma \vee \mathcal{C} = A \wedge B\) .
  • Middle diagram:** A sphere (grade-4 object) and a line (grade-3 object) intersect. The line passes through the sphere, and their intersection is a point pair, labeled C and D. Below the diagram is the equation \(\Sigma \vee L = C \wedge D\) .
  • Right diagram:** A plane (grade-4 object) and a circle (grade-3 object) intersect. The circle lies in the plane, and their intersection is a point pair, labeled E and F. Above the diagram is the equation \(\Pi \vee \mathcal{C} = E \wedge F\) .

Figure 1.11: Three diagrams illustrating the meet operator between grade-4 objects (sphere or plane) and a grade-3 object (line or circle).

Fig. 1.11 The meet operator between a grade-4 objects (sphere or plane) and a grade-3 object (line or circle), yields a grade-2 object (a point pair) unless degenerate.

1.2.8 Other algebras

1d-Up CGA

When dealing with transformations in Euclidean geometry in CGA, the point at infinity \(n_\infty\) is kept constant. However, we can work with non-Euclidean geometries if we keep different quantities constant. For example by keeping \(e\) constant, it can be shown that we are left with a hyperbolic geometry. Similarly, when \(\bar{e}\) is kept constant we are left with a spherical geometry. Note how, by keeping either one of the bases \(e\) or \(\bar{e}\) constant, we have only one additional basis vector compared to 3D GA (hence the name “1D-Up”) instead of two as in CGA (which is a “2D-Up” space compared to 3D GA) [95–97].

We will work with the latter case in Chapter 4, in which \(\bar{e}\) is kept constant and \(e\) is our origin. The main advantages of the 1D-Up approach are: (1) a lower dimensionality of the space compared to CGA, (2) that the Euclidean nature of the space (i.e. that all basis vectors square to \(+1\) ) allows us to construct a simple loss function that is invariant under rigid body transformations and (3) that both translations and rotations can be expressed as rotors, just as in the 2d-Up space. The curvature of the space is controlled by a free parameter \(\lambda\) (see Fig. 1.12), which we will discuss in further detail in Chapter 4.

Spacetime Algebra (STA)

The 3D Spacetime Algebra (STA), i.e. \(\mathcal{G}_{1,3,0}\) , is spanned by four basis vectors \(\{\gamma_\mu\}\) , with \(\gamma_0^2 = -\gamma_k^2 = 1\) and \(k = 1, 2, 3\) . The pseudoscalar is defined as \(I = \gamma_0\gamma_1\gamma_2\gamma_3\) . We

c436e079f79bca972b79ed4b3e4613ea_img.jpg

Figure 1.12: Two diagrams illustrating the effect of the parameter lambda on geometry in 1d-Up CGA. The left diagram shows a 3D coordinate system with a plane and a line. As lambda increases, the line transforms into a circle and the plane into a sphere. The right diagram shows a 3D coordinate system with a plane and a sphere. As lambda increases, the sphere transforms into a plane and the plane into a sphere.

Fig. 1.12 The parameter \(\lambda\) controls the curvature of the space in 1d-Up CGA. As \(\lambda\) increases, lines in \(G(4, 0, 0)\) are transformed into circles and planes are transformed into spheres.

say that \(\gamma_0\) is a timelike vector, while \(\gamma_1, \gamma_2, \gamma_3\) are spacelike vectors. A breakdown of the blades in STA is given in Table 1.3.

The even grade elements of the 3D STA form a subalgebra \(\mathcal{G}_{3,0,0}\) which is equivalent to the Pauli algebra. In fact, defining:

\[\sigma_k = \gamma_k \wedge \gamma_0 \quad (1.32)\]

we have that \(\sigma_k^2 = 1\) and that \(\sigma_1 \wedge \sigma_2 = I\sigma_3\) , \(\sigma_2 \wedge \sigma_3 = I\sigma_1\) and \(\sigma_3 \wedge \sigma_1 = I\sigma_2\) . We will work with STA in Chapter 5.

Table 1.3 Blades in Spacetime Algebra (STA) \(\mathcal{G}_{1,3,0}\) . Elements colored in red square to \(+1\) , while elements colored in blue square to \(-1\) .

Grade Orthonormal Basis Blade Type Geometry
0 1 Scalar Magnitude
1 \(\gamma_0, \gamma_1, \gamma_2, \gamma_3\) Vector Events in spacetime
2 \(\gamma_{01}, \gamma_{02}, \gamma_{03},\)
\(\gamma_{12}, \gamma_{13}, \gamma_{23}\)
Bivector Planes (time-like + space-like)
Purely spatial planes
3 \(\gamma_{012}, \gamma_{013}, \gamma_{023},\)
\(\gamma_{123}\)
Trivector Mixed volumes (1 time + 2 space)
Purely spatial volume
4 \(\gamma_{0123}\) Pseudoscalar 4-volume / Orientation

1.3 Learning with GA

The idea of using GA as a computational substrate for machine learning dates back to early works such as [135], and was further developed in the early 2000s through the contributions of [22, 23, 21, 12]. These studies introduced the notion of Clifford neurons, exploring the use of GA to design learning units and architectures that operate directly on multivectors. While visionary, these efforts remained mostly theoretical and were largely overlooked for over a decade, as the machine learning community increasingly consolidated around real-valued vector spaces and conventional deep learning frameworks.

Interest in GA for machine learning began to re-emerge in the 2020s. This revival was catalyzed by the broader development of geometric deep learning, a subfield of deep learning focused on building architectures that respect the symmetries and structure of geometric data. It leverages tools from group theory and differential geometry to enforce invariance and equivariance on data such as graphs, manifolds, and point clouds.

A notable contribution to this renewed interest came from Melnyk et al. [128], who introduced the Multilayer Geometric Perceptron (MLGP), an architecture composed of geometric neurons that achieves rotation- and translation-equivariant shape classification. This work represented one of the first functioning implementations of a neural network operating directly within a GA framework.

Following this, Brandstetter et al. [17] showed that Clifford neural networks can outperform standard real-valued architectures on partial differential equation (PDE) solving tasks. Their success was attributed to the natural ability of GA to represent vector fields and encode geometric constraints. In parallel, Ruhe et al. [158, 159, 19] extended these foundations by constructing architectures that are equivariant under the full Clifford group and introduced the idea of learning geometric templates and primitives. These developments reinforced the role of GA as a powerful tool for geometric computation in machine learning.

How does this thesis relate to the broader field of geometric deep learning?

Objects and operations introduced in Section 1.2 serve as the mathematical backbone throughout this thesis, as we will exclusively model problems in Bioinformatics (Chapter 3), Computer Vision (Chapter 4), and Physics Modelling (Chapter 5) in terms of multivectors. We will employ multivectors in two ways (see Fig. 1.13):

  1. Modelling: we simplify, model, and represent data in terms of GA objects and operations, including lines, planes, circles and rotors.
  2. Computing: we design learning units capable of processing multivector-valued data through specialized, GA-native operations, including geometric and sandwich products.

187d05bf7ead21e1394b61320d8b3632_img.jpg

The diagram illustrates the relationship between Machine Learning with Geometric Algebra and Geometric Deep Learning. It is divided into two main sections: 'Machine Learning with Geometric Algebra' (left, yellow box) and 'Geometric Deep Learning' (right, blue box). The 'Machine Learning' section is further divided into 'Modelling' (Chapters 2, 3) and 'Computing' (Chapters 3, 4, 5). 'Modelling' includes molecular geometry optimisation, learning rotation, and cost maps, leading to CGAPoseNet. 'Computing' includes STAResNet, Fengbo, GA-ReLU, CGAPoseNet+GCAN, Define, Refine, Align, and CGENN 3D Projector.

Diagram illustrating the relationship between Machine Learning with Geometric Algebra and Geometric Deep Learning. The diagram is split into two main sections: 'Machine Learning with Geometric Algebra' (left, yellow box) and 'Geometric Deep Learning' (right, blue box). The 'Machine Learning' section is further divided into 'Modelling' (Chapters 2, 3) and 'Computing' (Chapters 3, 4, 5). 'Modelling' includes molecular geometry optimisation, learning rotation, and cost maps, leading to CGAPoseNet. 'Computing' includes STAResNet, Fengbo, GA-ReLU, CGAPoseNet+GCAN, Define, Refine, Align, and CGENN 3D Projector.

Fig. 1.13 Multivectors for Modelling, Understanding, and Computing. This thesis aims to demonstrate the power of GA across the entire machine learning pipeline. Early works, such as those on rotation estimation and molecular geometry optimization (Chapter 2), use multivectors as a modelling tool. Later works like CGAPoseNet+GCAN (Chapter 4) and STAResNet (Chapter 5) exemplify true geometric deep learning, where multivectors serve not only as a modelling tool, but also as computational units within neural architectures. Both approaches serve the goal of understanding how geometric structure informs and constrains the learning process.

Earlier chapters (Chapter 2 and the first part of Chapter 3) concentrate on non-geometric settings: GA is used primarily as a modelling and representational framework, rather than as a means of enforcing geometric constraints or equivariances within architectures. Later chapters (the second half of Chapter 3, and Chapters 4 and 5) explore equivariant and geometric architectures relying on multivector-valued learnable parameters. This second half of the manuscript fits within geometric deep learning.

This division reflects both the natural progression of the PhD research and an intentional choice to highlight the broader capabilities of GA, as captured in the subtitle of the thesis: Multivectors for Modelling, Understanding, and Computing.

1.3.1 GA: the 6th “G” of Geometric Deep Learning?

A popular taxonomy of the field of geometric deep learning is the one introduced in [20]. This framework identifies five key domains, referred to as the 5 G’s, that capture the structural priors and symmetries commonly found in deep learning problems:

  • Grids: Data defined on regular Euclidean domains, such as images, audio signals, or video frames. Convolutional neural networks (CNNs) are the canonical models here, exploiting the translation symmetry of the grid [104].
  • Groups: Symmetries formalized via group theory, such as rotations, reflections, or permutations. Learning architectures that are invariant or equivariant under group actions fall into this category, improving generalization by embedding prior symmetry knowledge [34, 35, 40].
  • Graphs: Data with relational structure, where nodes and edges represent entities and their interactions. Graph neural networks (GNNs) operate in this regime, enabling learning on irregular, non-Euclidean domains such as social networks or molecular graphs [179, 190, 178].
  • Geodesics: Data that lies on smooth manifolds, such as spherical signals or curved surfaces. Models must account for curvature and intrinsic geometry, often using tools from differential geometry to define distances and gradients [125].
  • Gauges: The most abstract setting, involving local symmetries and transformations in fiber bundles. Gauge-equivariant architectures are particularly relevant in physics-informed learning and scenarios where data is expressed in locally varying frames [39].

How does GA fit within this taxonomy? GA provides a unified algebraic framework that naturally encompasses each of these domains:

  • Grids: GA offers coordinate-free representations for regular grid structures. In 2D or 3D grids (e.g., images or volumetric data), multivectors can efficiently encode geometric primitives such as edges, planes, and volumes. The geometric product captures local spatial relationships naturally, serving as a generalization of convolutional operations.
  • Groups: GA excels in representing symmetry groups central to geometric deep learning: Rotations via rotors in the even subalgebra; Translations and scalings

via versors in CGA; Conformal transformations unified in the algebra \(\mathcal{G}_{4,1}\) ; Lorentz transformations for spacetime models in physics. GA makes group actions both mathematically elegant and computationally tractable.

  • Graphs: GA can enrich graph neural networks by using multivectors to represent edges and node features. Unlike scalar weights, multivector-valued edge attributes can encode directionality, orientation, and higher-order geometric relationships (e.g., areas and volumes), providing a more expressive relational framework.
  • Geodesics: GA is particularly powerful for reasoning on manifolds. In CGA, geodesics (e.g., great circles on spheres or hyperbolic lines) can be expressed algebraically as intersections of geometric primitives. This facilitates learning tasks that depend on intrinsic distances, such as manifold embeddings or molecular conformer prediction.
  • Gauges: GA naturally handles basis changes and frame transformations. Its invariance under orthogonal transformations enables the construction of gauge-equivariant networks, useful for modelling systems with local symmetries, as seen in physics-informed and gauge-theoretic machine learning.

We therefore argue that GA functions as a meta-framework: whereas each of the 5 G's typically requires distinct mathematical tools, such as tensors, graphs, Lie groups, or Riemannian manifolds, GA unifies them within a single consistent formalism. This enables end-to-end learning systems in which representation, transformation, and reasoning are all performed natively in the same geometric space.

In summary, GA is not limited to any single domain among the 5 G's. Rather, it subsumes all five within a compact and expressive language, making it a compelling foundation for modelling, understanding, and computing in geometric deep learning (see Table 1.4).

1.3.2 Learning with GA in practice

In Table 1.4, we used the expression “embedding in a GA”. In this section, we clarify what that means in practice. Working with networks in GA means extending traditional models to operate directly on multivector-valued data, enabling more expressive representations of geometry, symmetry, and structure. This approach can be realized through the following three-step workflow:

Table 1.4 Embedding an architecture in a GA \(\mathcal{G}\) enables geometric deep learning on any of the 5 G's of [20]. Left column: architectures on which geometric deep learning is performed; right column: example of possible embedding of such architectures in GA. In this thesis, we provide examples of GA-based learning for grids (Chapter 5), groups (Chapter 3-4), geodesics (Chapter 4).

G geometric deep learning GA-based geometric deep learning
Grids CNNs, U-Nets, ResNets on regular lattices \(\mathcal{G}_{2,0}\) (2D grids), \(\mathcal{G}_{3,0}\) (3D grids), \(\mathcal{G}_{1,3}\) (spacetime grids)
Groups Group-equivariant CNNs, steerable networks \(\mathcal{G}_{3,0}\) ( \(\text{SO}(3)\) rotations), \(\mathcal{G}_{1,1}\) (conformal group), \(\mathcal{G}_{1,3}\) (Lorentz group)
Graphs GCNs, GraphSAGE, GATs \(\mathcal{G}_{n,0}\) (Euclidean node features), \(\mathcal{G}_{3,0}\) (3D molecular graphs)
Geodesics Riemannian NNs, hyperbolic NNs \(\mathcal{G}_{n+1}\) (spherical), \(\mathcal{G}_{n+1}\) (hyperbolic), \(\mathcal{G}_{4,1}\) (conformal model)
Gauges Gauge-equivariant CNNs, fiber bundle models \(\mathcal{G}_{1,3}\) (spacetime gauge fields), \(\mathcal{G}_{2,0}\) (2D Yang-Mills), \(\mathcal{G}_{4,1}\) (conformal gauge)
  1. Choose a suitable GA: Select an algebra \(\mathcal{G}_{p,q,r}\) that reflects the geometry of the problem domain. For instance, \(\mathcal{G}_{2,0}\) is appropriate for modelling 2D physical systems such as 2D Navier–Stokes equations, while \(\mathcal{G}_{4,1}\) , a conformal model, is well-suited for tasks involving 3D rigid-body motions.
  2. Embed your model in the algebra: Lift all data and learnable parameters into the chosen GA by introducing an additional tensor dimension of size \(2^n\) , where \(n = p + q + r\) is the number of basis vectors. Each element along this new axis corresponds to a basis blade (e.g., scalar, vector, bivector), allowing the network to represent and manipulate the full geometric content of the input.
  3. Build your architecture: Adapt standard neural architectures (e.g., CNNs, GNNs, transformers) to perform GA-native operations, such as the geometric product, outer product, and grade projections. These operations replace traditional scalar or matrix multiplications, enabling the model to reason over geometric structure in a principled way.

Note how this workflow is model-agnostic: it can be applied to a wide range of architectures, systematically incorporating geometric structure and symmetry through GA. The rest of this thesis assumes this workflow for building GA networks.

Chapter 2

Rotations as Rotors

There are no answers, only choices.


Solaris, Stanislaw Lem (1961)

2.1 Introduction

In this chapter we focus on the theme of representation. What defines a good choice when representing rotations? Are rotors a “good” mathematical representation for rotations? Do rotors lend themselves to learning problems? Is GA an optimal space in which to minimise a given objective function? We address these questions through two examples that will lay the foundations for the following chapters and justify our use of rotors in them.

Firstly, we discuss the issue of discontinuities in the representation of rotations, how it affects learning and how rotors (and bivectors) can overcome it in a more compact way compared to other methods in the literature through three toy problems in computer vision.

We then move onto the problem of molecular geometry optimisation in computational chemistry, and show how rotors can more effectively guide an unstable molecular conformation to its energetic minimum as opposed to other representations.

2.2 Learning rotations

2.2.1 Background

In problems such as pose prediction and generation, motion capture and inverse kinematics, neural networks (NNs) are generally trained to perform a regression on rotations [150, 126]. For these tasks, it is common to describe 3D rotations through 3 or 4D representations, including axis-angle representation [79], Euler angles [49] and quaternions [134].

However, it has been shown that 3 and 4D representations present some limitations. Grassia et al. [58], for example, demonstrated how an exponential parametrisation of rotations is more suitable for differentiation and integration as opposed to Euler angles or quaternions. Zhou et al. [201] found the source of large regression errors in the discontinuity of these representations, in which a representation is said to be discontinuous when the mapping \(g : SO(3) \rightarrow \mathbb{R}^D\) from the \(3 \times 3\) rotation matrix in \(SO(3)\) onto a given representation space is a discontinuous function. Saxena et al. [161] also highlighted representation discontinuity as a critical factor.

According to the universal approximation theorem, on the other hand, a NN with one hidden layer can approximate any continuous function for inputs that fall within a specific range [76]. Moreover, the process of training requires extensive computations of derivatives. Hence, trying to learn rotations when parametrised with a discontinuous representation might lead to large errors.

2.2.2 The 6D representation

A possible solution to overcome the discontinuity issue is to increase the number of degrees of freedom (DoF). In [201], 5 and 6D continuous representations have been proposed specifically for deep learning approaches to computer vision in order to represent rotations exclusively (i.e. without translation component). This 6D representation can be visualized as a \(3 \times 2\) matrix with columns \(a_1, a_2\) , which, after a Gram-Schmidt-like orthogonalization procedure, maps to the original rotation matrix in \(SO(3)\) with orthonormal columns \(b_1, b_2, b_3\) (Eq. 2.1).

\[g^{-1} \left( \begin{bmatrix} | & | \\ a_1 & a_2 \\ | & | \end{bmatrix} \right) \mapsto \begin{bmatrix} | & | & | \\ b_1 & b_2 & b_3 \\ | & | & | \end{bmatrix} \in SO(3) \quad (2.1)\]

in which \(g^{-1} : \mathbb{R}^D \rightarrow SO(3)\) is the mapping from the representation space to the the \(3 \times 3\) rotation matrix in \(SO(3)\) . We believe that this representation has some fundamental limitations:

  • The 6D representation proposed has more DoF than commonly employed representations, meaning more parameters to be learned.
  • It is fully learned. This guarantees continuity, but it also means that no closed form expression exists to convert a rotation from \(SO(3)\) into its 6D representation.
  • It is tied to an ad hoc loss function to be minimised, namely the \(L_2\) norm between the original and predicted rotation matrices after the aforementioned orthogonalization procedure, that has to be performed at every iteration. This loss function is less intuitive and more computationally expensive than those normally used in regression tasks.
  • No conditions are imposed to avoid \(a_1 = b_1; a_2 = b_2\) i.e. that the 6D representation is learned to be equal to the first two columns of the rotation matrix. This identity mapping is easier to "learn" and indeed a possible scenario, but makes the comparison with other representations biased.

We address these points by reproducing the experiments in [201] employing a full GA formulation.

2.2.3 Learning mappings between representation spaces

The first experiment, which we will refer to as ‘sanity check’, is to use a multi-layer perceptron (MLP) to learn the mapping from rotation matrix \(M \in SO(3)\) to a given representation \(\mathcal{R} \in \mathbb{R}^D\) , in which \(D\) is the number of DoF associated with \(\mathcal{R}\) , or its dimensionality. Then, the representation \(\mathcal{R}\) is converted back to an estimate \(M'\) and the closeness between \(M'\) and \(M\) is measured. We first tested Zhou’s 6D representation along with quaternions, axis-angle, Euler angles and bivectors.

The network has four dense layers with 128 neurons each and a Leaky ReLU activation function between them. The optimiser has been kept as Adam, the learning rate to the default value of 0.001, the number of epochs to \(E = 100\) and the batch size as 64. As a loss function, we adopted either the mean squared error (MSE) or the mean absolute error (MAE) between the target and input representation, defined as:

\[MSE = \frac{1}{D} \sum_{i=1}^D (y_i - y'_i)^2 \quad (2.2)\]

\[MAE = \frac{1}{D} \sum_{i=1}^D \|y_i - y'_i\| \quad (2.3)\]

where \(D\) , \(y\) and \(y'\) are the representation dimensionality, original and predicted representations, respectively. In the case in which \(y\) is a bivector, for example, then the \(y_i\) 's will be the coefficients \(b_{12}, b_{23}, b_{13}\) of the bivectors in \(\mathcal{G}_{3,0,0}\) ; if \(y\) are Euler angles, then the \(y_i\) 's will correspond to the angles \(\alpha, \beta, \gamma\) . We adopted the MSE as it is simple and commonly used in regression tasks. Only for the 6D representation did we employ the \(L_2\) distance between rotation matrix \(M\) and predicted matrix \(M'\) , defined as:

\[L_2 = \sqrt{\sum_{i=1}^3 \sum_{j=1}^3 (y_{ij} - y'_{ij})^2} \quad (2.4)\]

where \(y \equiv M\) , \(y' \equiv M'\) are the original and predicted rotation matrices, respectively. Note that, according to its definition, the 6D representation can only have the \(L_2\) norm between \(M, M'\) as a loss. This is due to two reasons:

  • The main feature of the 6D representation, i.e. that the rotation matrix is obtained through orthogonalization of the 6D representation, is implemented within the loss function itself.
  • There is no closed form expression to represent a rotation matrix as its 6D counterpart (in other words, the mapping \(g : SO(3) \rightarrow \mathbb{R}^6\) is fully learned). This means that the ground truth, when learning the 6D representation, cannot be expressed in 6D, but only as the original rotation matrix.

It is worth mentioning that in [201] all the experiments share the \(L_2\) norm as a loss function. We believe that this is not only a more complex approach, as the mapping \(g^{-1} : \mathbb{R}^D \rightarrow SO(3)\) has to be evaluated at each iteration, but it is also penalizing discontinuous representations (i.e. those representations for which \(g\) is discontinuous): we want to demonstrate that these representations can be learned successfully when this mapping is omitted from the learning strategy.

The results from [201] have been reproduced and validated on a set of \(10^5\) rotation matrices with a 67-33 train-test split and 30% of the training set employed for validation. The maximum, mean and standard deviation of the geodesic error measured on the testing set are shown in Table 2.1, along with the loss function employed during training and the kind of error employed. The geodesic error is defined as the minimal angular distance between two rotations, where the concept of geodesic is defined in metric geometry as the curve of shortest distance between two points on a manifold. While

different formulations exist in the literature, we will stick to the definition given in [201], expressed as

\[\mathcal{L} = \arccos\left(\frac{\text{tr}(M'') - 1}{2}\right) \quad (2.5)\]

Where \(M'' = MM^T\) . Table 2.1 shows that the 6D representation indeed overcomes the discontinuity issue and outperforms significantly the other representations, including the bivector representation that we tested in addition to the others. Note how, when using MSE between quaternions, the matrix-to-quaternion/rotor learning does not converge to a minimum, most likely due to the antipodal symmetry of the representation, meaning that quaternions \(q\) and \(-q\) represent the same rotation [109]. Our results, however, are in agreement with Zhou’s when employing the same \(L_2\) loss.

Table 2.1 Average Geodesic Error ( \(^\circ\) ), Sanity Check, \(M \mapsto \mathcal{R}\)

Representation Loss Maximum Mean Std. Dev.
Euler MSE 179.7 5.93 8.68
Axis-Angle MSE 179.7 4.22 8.79
Quaternion/Rotor MSE 180.0 80.3 44.8
Quaternion/Rotor \(L_2\) 179.4 5.29 8.90
Bivector ( \(\frac{1-R}{1+R}\) ) MSE 179.1 4.49 8.54
6D \(L_2\) 2.64 1.06 0.41
Euler (Zhou et al.) \(L_2\) 179.9 6.98 17.31
Axis-Angle (Zhou et al.) \(L_2\) 179.2 3.69 5.99
Quaternion (Zhou et al.) \(L_2\) 179.9 3.32 5.97
6D (Zhou et al.) \(L_2\) 1.98 0.49 0.27

GA-based Sanity Check

We believe that the errors shown in Table 2.1 are specific to the kind of mapping learned rather than intrinsic to the representation itself. The maximum \(180^\circ\) error, said to come from the discontinuity of the representation, in fact, is present only when adopting Eq.2.5 as a metric and when the representation is learned starting from the rotation matrix. We now proceed to show that the maximum geodesic error is much smaller when keeping the rotations in GA.

We trained the same network to learn the bivector \(B\) from the rotor \(R\) , either as its Cayley transform or as its logarithm, and then converted back to estimated rotor \(R'\) , where \(R' = e^{-B/2}\) or \(R' = (1 - B)/(1 + B)\) . We measured the regression error via Eq. 2.5 after converting rotors to their corresponding matrix representation.

Results are shown in Table 2.2. Note how comparable results between the 6D representation and the bivector are obtained when expressing the bivector as the Cayley transform of the rotor, with a mean geodesic error of \(1.04^\circ\) for the rotor-to-bivector mapping and of \(1.06^\circ\) for the matrix-to-6D mapping. The bivector, however, only requires 3 DoF to be learned instead of 6. Note also how the same bivector yields a mean geodesic error about three times smaller when learned from the rotor (in Table 2.2) than when learned from the rotation matrix (in Table 2.1)

Table 2.2 Geodesic Error ( \(^\circ\) ), Sanity Check, \(R \mapsto B\)

Representation Loss Maximum Mean Std. Dev.
Bivector ( \(-2 \log R\) ) MSE 154.1 1.37 4.84
Bivector \(\left(\frac{1-R}{1+R}\right)\) MSE 27.7 7.33 4.12
Bivector \(\left(\frac{1-R}{1+R}\right)\) MAE 17.11 1.04 0.61
6D \(L_2\) 2.64 1.06 0.41

2.2.4 Pose estimation of 3D point clouds

For this task we trained a network that takes as input a reference and a target point cloud \(P_r, P_t \in \mathbb{R}^{N \times 3}\) respectively, where \(N\) is the number of points per point cloud, to predict a representation \(\mathcal{R} \in \mathbb{R}^D\) of the rotation between the two, where \(D\) is the dimensionality of the representation. These include the rotation matrix ( \(D = 9\) ), the 6D representation ( \(D = 6\) ), Euler angles ( \(D = 3\) ), the axis-angle representation ( \(D = 3\) ), the quaternion/rotor ( \(D = 4\) ) and the bivector ( \(D = 3\) ).

The network is a slightly modified version of PointNet [152], see Figure 2.1. We assume that the registration between the point clouds is known, i.e. we know which points in \(P_r\) correspond to which in \(P_t\) . The input of this network is then the concatenated point clouds \([P_r, P_t]\) of size \(N \times 6\) , passed through five layers of 1D convolutions of size \(6 \times 64 \times 128 \times 256 \times 1024\) , and then through three more dense layers of size \(512 \times 512 \times D\) , where \(D\) is the dimensionality of \(\mathcal{R}\) .

The optimiser has been kept as Adam, the learning rate to the default value of 0.001, the batch size as 32 and the loss to MAE (for this specific task, we empirically verified that it performed better than MSE). A Leaky ReLU activation function has been employed between layers, with the exception of the last layer that uses a linear activation function. The dataset is composed of 726 airplane point clouds taken from the ModelNet10 dataset [191], with 626 training and 100 testing samples. Of the 626 training samples, 30% was employed for validation. Each point cloud is composed of \(N\)

b4b91e1f5ced9a2bc4a7f3b038cf3fb6_img.jpg

The diagram illustrates the network architecture for pose estimation. The input is the concatenation of reference point cloud \(P_r\) and target point cloud \(P_t\) , each represented by three coordinates \(x_{r1}, x_{r2}, x_{r3}\) and \(x_{t1}, x_{t2}, x_{t3}\) . These are processed through a series of 1D convolutional layers: 1D Conv 6, 1D Conv 64, 1D Conv 128, 1D Conv 256, and 1D Conv 1024. The output of the final convolutional layer is then processed by two dense layers, Dense 512 and Dense 512, leading to an output layer that produces three parameters \(y_1, y_2, y_3\) .

Diagram of a neural network architecture for pose estimation from 3D point clouds. The input consists of concatenated reference (P_r) and target (P_t) point clouds, shown as two 3D plots of an airplane. These are processed through five 1D convolutional layers with 6, 64, 128, 256, and 1024 filters respectively. This is followed by two dense layers with 512 units each, and finally an output layer that produces three parameters y1, y2, and y3.

Fig. 2.1 The network for the pose estimation from 3D point clouds problem. Input: concatenated reference and target point clouds \(P_r, P_t \in \mathbb{R}^{N \times 3}\) . Output: rotation from \(P_r\) to \(P_t\) with \(D\) parameters.

= 3000 points. To each of the 726 point clouds is associated a random rotation matrix from which we obtained the 8 different representations tested.

Results are summarized in Table 2.3, ordered by increasing mean geodesic error along with the number of DoF of the representation. The column "Epochs" specifies the number of training epochs for each representation. Firstly, it can be noticed that using MAE instead of the \(L_2\) norm between \(M\) and \(M'\) , yields a lower error for the matrix representation compared to the 6D representation, as opposed to findings in [201]. Secondly, that the \(180^\circ\) maximum error is present only in some of the measurements, and not consistently, for all the representations analyzed. This leads to a maximum geodesic error averaged over five experiments always below \(180^\circ\) , also for those representations defined as discontinuous by Zhou et al. (e.g. axis-angle, Euler angles, etc.), for which we would expect an error around \(180^\circ\) . We believe that the difference between our results and Zhou's results rely in the different learning strategy and mapping learned, not in the type of representation.

The distribution of the geodesic error over the testing set along with its percentile plot are given in Fig. 2.2, confirming the superiority of our GA approach. Note how the 75th percentile is achieved at \(15^\circ\) for the bivector (expressed as the logarithm of the rotor) compared to \(21^\circ\) and \(25^\circ\) for the 6D and the matrix representations, respectively.

These results show that a pure geometric algebra description with 3 DoF is as good as a representation with 6 DoF, with a simpler loss function employed and an

Table 2.3 Geodesic Error ( \(^\circ\) ), Pose Estimation from 3D Point Clouds.

Representation DoF Loss Maximum Mean Std. Dev. Epochs
Matrix 9 MAE 180.0 6.91 19.8 33
6D 6 \(L_2\) 155.7 8.36 15.1 51
Bivector ( \(-2\log R\) ) 3 MAE 72.1 8.41 7.53 59
Bivector ( \(\frac{1-R}{1+R}\) ) 3 MAE 106.8 13.9 16.0 52
Rotor 4 MAE 103.5 16.4 17.9 63
Quaternion 4 MAE 166.5 16.4 17.2 63
Axis-Angle 3 MAE 118.7 18.2 13.7 47
Euler Angles 3 MAE 165.0 26.3 29.6 66
6D (Zhou et al.) 6 \(L_2\) 179.8 2.85 9.16
Matrix (Zhou et al.) 9 \(L_2\) 180.0 4.21 9.44
Quaternion (Zhou et al.) 4 \(L_2\) 179.7 9.03 16.3
Axis-Angle (Zhou et al.) 3 \(L_2\) 179.7 11.9 21.4
Euler Angles (Zhou et al.) 3 \(L_2\) 179.7 14.1 23.8

fcdcb5991d671cadf547dd3fbfe13486_img.jpg

The figure consists of two side-by-side plots. The left plot is a histogram showing the 'Normalized Count' of geodesic error (in degrees) for eight different rotation representations. The x-axis ranges from 0 to 100 degrees, and the y-axis ranges from 0.00 to 0.05. The Matrix representation (green line) has a very sharp, high peak at approximately 10 degrees. The 6D (cyan), Rotor (blue), Quaternion (red), Bivector (Exp) (orange), Bivector (Cayley) (purple), Euler (black), and Axis Angle (yellow) representations show broader, lower peaks. The right plot is a cumulative distribution function (CDF) showing the 'Probability' of the geodesic error (in degrees). The x-axis ranges from 0 to 100 degrees, and the y-axis ranges from 0.0 to 1.0. The Matrix representation (green line) reaches a probability of 1.0 very quickly, around 20 degrees. The other representations reach 1.0 more gradually, with the Axis Angle (yellow) and Euler (black) representations being the slowest to reach 1.0.

Figure 2.2: Distribution of the geodesic error over the testing set. The left plot shows the Normalized Count of error (0 to 100 degrees) for various representations. The right plot shows the Probability (0.0 to 1.0) of error (0 to 100 degrees). Both plots compare Matrix, 6D, Rotor, Quaternion, Bivector (Exp), Bivector (Cayley), Euler, and Axis Angle representations. The Matrix representation shows a sharp peak at low error, while the Axis Angle representation shows a broader distribution.

Fig. 2.2 Distribution of the geodesic error over the testing set for the proposed representations (left) and related percentile plot (right) for the pose estimation problem.

equivalent geodesic error formulation measured exclusively through rotors, without the need to employ rotation matrices.

7b96fce298a23fd76a01ff6c176c1059_img.jpg

Noise Standard Deviation Mean Geodesic Error (°) - Bivector Mean Geodesic Error (°) - 6D Mean Euclidean Distance - Bivector Mean Euclidean Distance - 6D
0 ~10 ~10 ~0.003 ~0.003
1 ~60 ~110 ~0.011 ~0.010
2 ~65 ~125 ~0.013 ~0.012
5 ~68 ~130 ~0.012 ~0.013

Figure 2.3: Two line graphs comparing Bivector and 6D representations. The left graph shows Mean Geodesic Error (°) vs Noise Standard Deviation (0 to 5). The right graph shows Mean Euclidean Distance vs Noise Standard Deviation (0 to 5). In both cases, the Bivector representation (red line) consistently shows lower error and distance than the 6D representation (blue line) as noise increases.

Fig. 2.3 Average geodesic error (left) and Euclidean distance (right) as a function of the standard deviation of the noise applied on \(P_r, P_t\)

Noisy Point Clouds

We then studied the case in which Gaussian noise is added to the point clouds. It is safe to assume imperfect acquisition of samples from a scene (e.g., due to a shaky camera) in a realistic scenario. We restricted the analysis to the two representations with lowest error in the noiseless case, namely the (exponential) bivector and the 6D. We added noise to the reference and target point clouds \(P_r, P_t\) , respectively. Noise has been modelled as additive white Gaussian noise (AWGN) \(\sim \mathcal{N}(0, \sigma^2)\) with variable \(\sigma = \{0.1; 1; 2; 5\}\) . The geodesic error and Euclidean distance measured for the two representations on the testing set are given in Figure 2.3.

It can be seen how, in high noise scenarios, the network likely fits to the noise when trying to predict more parameters of the 6D representation instead of the 3 DoF of the bivector. The average geodesic errors are noticeably higher for the 6D representation than for the bivector case as noise is increased. This is also mirrored in the Euclidean distance between original and predicted point clouds.

The 6D representation is very accurate with little-to-no noise, but it fails when the standard deviation of the noise increases. The bivector representation, on the other hand, is more tolerant to high noise conditions, showing a consistently lower geodesic error at a fraction of the required DoF and with a simpler loss function employed during training.

2.2.5 Inverse kinematics

The goal of this experiment is to estimate the rotation from the T-pose to any arbitrary pose of a skeleton. By T-pose we define the skeleton in standing position and arms stretched out that is commonly used in computer graphics for calibration purposes. The input of the network takes the 3D positions of the \(N = 31\) joints of a skeleton as \(P = (p_1, p_2, p_3, \dots, p_N)\) , where \(p_i = (x_i, y_i, z_i)^T\) . The outputs of the network are the rotations that a T-pose skeleton undergoes to reach the input position. The output is expressed as \(R = (r_1, r_2, r_3, \dots, r_N)\) , with \(r_i \in \mathbb{R}^D\) , where \(D\) is the dimension of the representation. The rotations are hierarchical, in the sense that the rotation of each joint depends on the previous one and on how they are linked to each other. The adopted NN is a four-layer MLP with 1024 neurons per layer (see Figure 2.4). The problem is formulated as a supervised learning problem, that allows us to use MSE loss consistently with the only exception being the 6D representation.

We used 760 clips from the CMU Motion Capture Database (MoCap) [72]. From these clips we selected 10000 frames, with a 67-33 train-test split. 30% of the training set was employed for validation. We centered the position of the root joint (defined as the midpoint between the two hips) at the origin so as to predict solely the rotational component of the joints and not the translation of the skeleton. The network was trained with batch size 64. The average geodesic error between ground truth (from the testing set) and predicted rotation and Euclidean distance of the predicted poses from the ground truth are displayed in ascending order in Tables 2.4 - 2.5, respectively.

977811d1c73b74f801be9f4c376694ca_img.jpg

Diagram of the neural network architecture for inverse kinematics. On the left, a 3D plot shows a red stick figure (skeleton) in a T-pose. An arrow points from this figure to a neural network diagram. The network consists of an input layer with three nodes labeled x1, x2, and x3. This is followed by four 'Dense' layers, each containing 1024 nodes (represented by blue circles). The final output layer contains three nodes labeled y1, y2, and y3. All nodes in adjacent layers are fully connected.

Fig. 2.4 The network for the inverse kinematic problem. Input: frames with \(31 \times 3\) spatial coordinates. Output: 31 rotations with \(D\) parameters.

a7d6560ff54237234261b647f30ec25c_img.jpg

Figure 2.5 consists of two plots. The left plot is a normalized count histogram of geodesic error in degrees, ranging from 0 to 160. The y-axis is 'Normalized Count' from 0.000 to 0.014. The right plot is a cumulative probability plot of geodesic error in degrees, ranging from 0 to 200. The y-axis is 'Probability' from 0.0 to 1.0. Both plots compare six representations: Matrix (green), 6D (cyan), Quaternion (yellow), Bivector (Exp) (blue), Bivector (Cayley) (red), Euler (magenta), and Axis Angle (black). In the histogram, Bivector (Exp) has the highest peak at approximately 60 degrees, while Matrix has the lowest. In the cumulative plot, all representations show similar performance, reaching a probability of 1.0 around 150 degrees.

Figure 2.5: Distribution of the geodesic error over the testing set for the proposed representations (left) and related percentile plot (right) for the inverse kinematic problem.

Fig. 2.5 Distribution of the geodesic error over the testing set for the proposed representations (left) and related percentile plot (right) for the inverse kinematic problem.

9bc3f944f7175ab90d634f6e577acaae_img.jpg

Figure 2.6 consists of two plots. The left plot is a normalized count histogram of Euclidean distance in cm, ranging from 2 to 10. The y-axis is 'Normalized Count' from 0.0 to 0.7. The right plot is a cumulative probability plot of Euclidean distance in cm, ranging from 2 to 10. The y-axis is 'Probability' from 0.0 to 1.0. Both plots compare the same six representations as in Figure 2.5. In the histogram, the Matrix representation (green) has the highest peak at approximately 4.5 cm, while Bivector (Exp) (blue) has the lowest. In the cumulative plot, Matrix reaches a probability of 1.0 around 6 cm, while Bivector (Exp) reaches it around 8 cm.

Figure 2.6: Distribution of the Euclidean distance between predicted pose and T-pose over the testing set for the proposed representations (left) and related percentile plot (right), inverse kinematic problem.

Fig. 2.6 Distribution of the Euclidean distance between predicted pose and T-pose over the testing set for the proposed representations (left) and related percentile plot (right), inverse kinematic problem.

Table 2.4 Geodesic Error ( \(^\circ\) ), Inverse Kinematics.

Representation Loss Maximum Mean Std. Dev. Epochs
Bivector \((\frac{1-R}{1+R})\) MSE 168.4 38.5 31.0 39
Matrix MSE 180.0 37.6 59.6 84
6D \(L_2\) 180.0 46.9 39.1 31
Axis-Angle MSE 180.0 58.6 40.3 104
Bivector, \(-2 \log R\) MSE 160.1 60.5 27.1 46
Euler Angles MSE 180.0 61.0 41.6 53
Quaternion MSE 180.0 63.3 46.6 63

Table 2.5 Euclidean Distance (cm), Inverse Kinematics.

Representation Loss Maximum Mean Std. Dev. Epochs
Matrix MSE 7.76 4.37 0.68 84
Bivector \((\frac{1-R}{1+R})\) MSE 8.48 4.98 0.79 39
6D \(L_2\) 9.21 5.48 1.28 31
Axis-Angle MSE 9.26 5.53 1.09 104
Euler Angles MSE 9.07 5.58 1.11 53
Quaternion MSE 9.26 5.71 1.10 63
Bivector, \(-2 \log R\) MSE 9.15 6.95 0.75 46

It can be noticed that there is no perfect correspondence between a low geodesic error and an accurate motion estimation. Moreover, the difference in Euclidean distance between different representations is small and at most 1.5 cm. The small difference between the representations is probably due to the supervised formulation of the problem. The bivector expressed as the Cayley transform of the rotor allowed the second best pose estimation after the rotation matrix in terms of mean euclidean distance. The rotation matrix has been proven to be consistently the best representation for all three experiments, as expected due to the higher number of DoF.

The distributions over the testing set of the geodesic error and Euclidean distance along with their percentile plots are given in Fig. 2.5 - 2.6, respectively. Note how in Fig. 2.5 the 75th percentile is achieved at about \(55^\circ\) and \(75^\circ\) for the (Cayley) bivector and the 6D and (log) bivector, respectively. Similarly, in Fig. 2.6, the 75th percentile is at about 4.5 cm for the matrix representation, 5 cm for the (Cayley) bivector and above 6 cm for the 6D representation.

To guarantee fairness of comparison between different representations in the pose estimation and inverse kinematics experiments, we regularized the training of the architectures by implementing early stopping with patience of \(P = 10\) epochs. The validation loss curves for the two experiments are shown in Fig. 2.7. They have been normalized to show convergence rather than their absolute value. For the sanity check, instead, we fixed \(E = 100\) as no overfitting was observed.

53d8bef47c63e0897de4cd058bad2cbd_img.jpg

Figure 2.7: Two line plots showing Normalized Validation Loss versus Epochs for different molecular representations. The left plot is for the pose estimation problem (Epochs 0-70) and the right plot is for the inverse kinematic problem (Epochs 0-120). Both plots compare eight representations: Matrix (blue), 6D (orange), Quaternion / Rotor (green), Bivector (Cayley) (cyan), Bivector (Exp) (magenta), Axis Angle (yellow), and Euler (black). In both cases, the loss for all representations decreases rapidly and stabilizes at low values, with the Matrix and 6D representations generally showing the lowest final loss.

Fig. 2.7 Validation loss with early stopping of each of the 8 representations in the pose estimation problem (left) and inverse kinematic problem (right).

2.3 Molecular geometry optimisation through rotors

2.3.1 Background

Optimising the geometry of a molecule means finding the arrangement in 3D space of its atoms for which the overall energy of the molecule is minimised. The stability of molecular conformations is particularly relevant when designing drugs and studying their effects: a molecule might exist in nature with multiple geometries, or conformers. Different conformers of a same molecule, for example, might or might not be toxic to humans. Hence the study of molecular conformations is of particular relevance.

We will focus our attention on proteins. A protein can be thought of as a chain of amino acids. Each amino acid is characterized by a side chain, which differentiates them, bonded to a \(C_\alpha\) atom. The \(C_\alpha\) is itself bonded to an \(N\) atom and to a \(C\) atom. The repetition of \(N - C_\alpha - C\) triplets in each amino acid in the protein chain defines the protein backbone.

Generally, the 3D structure of a protein is modelled through dihedral angles \(\phi, \psi, \omega\) (i.e. internal coordinates) for the backbone and \(\chi_i\) for the side chains. Each dihedral angle defines a rotation around a bond. Performing a geometry optimisation by

exploring the space of dihedral angles is time consuming and implies different axes of rotation for different dihedral angles.

An example of this approach can be found in [123], in which the fitness function of the optimisation algorithm is a function of the dihedral angles. This strategy has two drawbacks: the first is that the number of degrees of freedom, even for a single amino acid with few atoms, is large. For a single amino acid such as aspartic acid, for example, six dihedral angles were employed \(\{\phi, \psi, \omega, \chi_1, \chi_2, \chi_3\}\) . In fact, Mancini et al. [123] do not report results on dipeptides (i.e. chains of two amino acids, or peptides). The second is that each dihedral angle specifies a rotation with respect to different axes with different pivot points, making rotations difficult to deal with.

In this Section we implement an evolutionary algorithm (EA) to search for stable conformers of molecules by performing an exploration over the rotor space. We associate an amino acid pair with a rotor that describes their relative orientation. Rotors are then employed as a metaheuristic to guide the minimisation of the overall energy of the molecular structure. The proposal of rotors drives the ‘evolution’ of the molecule towards its more stable conformation. We believe that the mathematical formalism of rotors is particularly suitable to describe molecular structures and in particular proteins, which are inherently regular.

We then compare the rotor-guided algorithm with different representations and discuss whether rotors are an optimal representation for the exploration of molecular conformers. We verified this approach on two simple molecules well understood in the literature, the glycine-glycine dipeptide and the glycine-phenylalanine dipeptide.

2.3.2 Molecules as rotors

We model a molecular structure as follows: we associate a rotor \(R\) to each pair of amino acids. A molecule of \(p + 1\) amino acids then characterised by a set of \(p\) rotors \(\mathbf{R} = \{R_0, \dots, R_{p-1}\}\) . Each rotor \(R_i\) applies a uniform rotation to each atom from the \(i + 1\) -th amino acid onwards.

A rotor \(R\) is described by real 4 parameters \(\alpha, \beta_{12}, \beta_{13}, \beta_{23}\) (that could actually be reduced to 3, since \(\alpha^2 = \beta_{12}^2 + \beta_{13}^2 + \beta_{23}^2 = 1\) ). 4 parameters to describe a dipeptide are a significant decrease in the numbers of degrees of freedom of the system compared to the 6 parameters of [123] for a single peptide. We do not include any rotation of the side chain as we decided to simplify as much as possible the treatment by considering only the backbone of the molecule and letting the side chain rotate with the backbone. We optimise the side chain position in 3D space through local geometry optimisation of the software xTB [59] as a refinement step before the fitness evaluation.

In the case of small molecules, in which clashes between planes of amino acids are unlikely to happen, it is possible to leave the rotor unconstrained and to employ completely random rotors, so to maximize the exploration space.

However, to make sure the generated molecular conformations are sound from a chemical point of view, we can further constrain the rotor space by limiting the amino acids to two possible movements, i.e. a rotation along the \(C_i - N_{i+1}\) axis, specified by the vector \(x_{C-N}\) , corresponding to a twist of the amino acid, or a rotation along the \(C_{\alpha,i} - C_i\) axis, specified by the vector \(x_{C_{\alpha}-C}\) , corresponding to a fold of the back bone chain (see Fig. 2.8). The corresponding rotors can be obtained as

\[R_\theta = e^{Ix_{C-N}\theta} = e^{B_{C-N}\theta} \quad (2.6)\]

\[R_\phi = e^{Ix_{C_{\alpha}-C}\phi} = e^{B_{C_{\alpha}-C}\phi} \quad (2.7)\]

in which \(I = e_{123}\) is the pseudoscalar of \(\mathcal{G}_{3,0}\) , \(B_{C-N}\) is the bivector specifying the plane of rotation perpendicular to the \(C - N\) axis, \(B_{C_{\alpha}-C}\) is the bivector specifying the plane of rotation perpendicular to the \(C_{\alpha} - C\) axis, \(\theta\) is the angle of twisting and \(\phi\) the angle of folding.

In this case, for a molecule of \(p + 1\) amino acids, we will then have \(2p\) rotors \(\mathbf{R} = \{R_{\theta,0}, R_{\phi,0}, \dots, R_{\theta,p-1}, R_{\phi,p-1}\}\) , further reducing the numbers of degrees of freedom.

73b92c6496912079a4c728bb0c4fdbfd_img.jpg

(a) Twisting rotation along the \(x_{C-N}\) axis,
or equivalently in the \(B_{C-N}\) plane.

(b) Folding rotation along the \(x_{C_{\alpha}-C}\) axis,
or equivalently in the \(B_{C_{\alpha}-C}\) plane.

Figure 2.8 shows two diagrams of a tripeptide molecule illustrating the two types of rotors. (a) Twisting rotation along the x_{C-N} axis, or equivalently in the B_{C-N} plane. The diagram shows the molecule with a blue plane B_{C-N} and a red dashed line representing the x_{C-N} axis. (b) Folding rotation along the x_{C_{\alpha}-C} axis, or equivalently in the B_{C_{\alpha}-C} plane. The diagram shows the molecule with a blue plane B_{C_{\alpha}-C} and a red dashed line representing the x_{C_{\alpha}-C} axis.

Fig. 2.8 The two types of rotors employed to model molecular structures.

2.3.3 The evolutionary algorithm

We implemented a differential evolution (DE) algorithm, which belongs to the class of evolutionary algorithms (EAs). DE is a popular optimisation algorithm first introduced in [167] and employed to minimise real multivariate functions. DE starts with an initial population of candidates and keeps the candidate with the best fitness, i.e. candidates

for which the function minimised reaches a (local) minimum as the parameters are varied. The main advantage of DE is a simple problem formulation which does not require the objective function to be differentiable, i.e. it is a gradient-agnostic optimisation method.

In our case, the individual candidates are the 3D structures of molecules, encoded as .xyz files, containing the atomic symbol and the corresponding position in \(\mathbb{R}^3\) for each atom in the molecule. Borrowing from the jargon of genetic algorithms, each individual can be described by a set of genes, each of which is expressed in terms of its alleles. Our goal is to employ rotors as genes of the DE algorithm.

The objective function to be minimised is defined as

\[f(\mathbf{R}) = \Delta E(\mathbf{R}) = E_0 - E_k(\mathbf{R}) \quad (2.8)\]

where \(E_0\) is the energy of the initial molecular conformation, taken as the reference structure, and \(E_k\) is the energy of the \(k\) -th individual of the population, which is obtained applying a set of rotors \(\mathbf{R} = \{R_0, R_1, \dots, R_{p-1}\}\) , with \(p\) being the total number of genes, on the initial reference structure.

The objective function is hence a map of the type \(f : \mathbb{R}^{p \times d} \rightarrow \mathbb{R}\) , in which \(d\) is the dimensionality of the alleles, which depends on their parametrisation. For example, our gene \(R = \alpha + \beta_{12}e_{12} + \beta_{13}e_{13} + \beta_{23}e_{23}\) between each pair of \(N - C_\alpha - C\) triplets can be parametrised by its coefficients. In that case the DE algorithm will explore the space spanned by the coefficients \(\{\alpha, \beta_{12}, \beta_{13}, \beta_{23}\}\) , meaning that \(f : \mathbb{R}^{(p-1) \times 4} \rightarrow \mathbb{R}\) , in which \(p\) is the number of triplets, or amino acids, in the molecule. Conversely, if we fix the axes of rotation to two possible movements, as shown in Fig. 2.8, we only have one angle per rotor as degree of freedom, meaning that the DE algorithm will only explore the space spanned by \(\{\theta, \phi\}\) . This implies that in this case \(f : \mathbb{R}^{2(p-1)} \rightarrow \mathbb{R}\) , in which the factor of 2 comes from the fact that now we have 2 rotors per each pair of \(N - C_\alpha - C\) triplets.

The energy \(E_k(\mathbf{R})\) is evaluated through the single-point xTB method [59], meaning it only requires the structure geometry and avoids element-pair-wise parametrisation. The absolute energy, measured in hartrees (symbol: \(E_h\) ), is not insightful per se, which is why defined our function in terms of deviation from a reference instead. A negative \(\Delta E(\mathbf{R})\) means that the molecular conformation of the \(k\) -th individual, with energy \(E_k\) , is more energy favourable than the starting conformation with energy \(E_0\) .

The rest of the algorithm works as follows and it is summarized in Algorithm 1: for each individual, i.e. candidate molecular structure, with initial genes \(\mathbf{R}_x = \{R_{x,0}, \dots, R_{x,n}\} \in \mathbb{R}^n\) , with \(n = \{(p-1) \times 4, 2(p-1)\}\) based on the different representation:

  • Three different structures are picked, obtained by rotating the reference conformation with genes \(\mathbf{R}_a, \mathbf{R}_b, \mathbf{R}_c\) such that \(\mathbf{R}_a \neq \mathbf{R}_b \neq \mathbf{R}_c \neq \mathbf{R}_x\) .
  • Select a random index \(P \in \{0, \dots, n\}\) corresponding to a random bond in the chain.
  • For each \(i \in \{0, \dots, n\}\) pick a uniformly distributed random number \(r_i \sim U(0, 1)\)
  • If \(i = P\) or \(r_i < CP\) , in which \(CP\) is the crossover probability of the algorithm, then select \(R_{y,i} = R_{a,i} + F(R_{b,i} - R_{c,i})\) , in which \(F\) is the differential weight parameter of the algorithm.
  • Build \(\mathbf{R}_y = \{R_{y,0}, \dots, R_{y,n}\}\)
  • if \(f(\mathbf{R}_y) \leq f(\mathbf{R}_x)\) , then replace the conformation described by the set of rotors \(\mathbf{R}_x\) with that described by the set of rotors \(\mathbf{R}_y\) .
  • Repeat until the number of maximum iterations has exceeded or convergence has been reached.

Algorithm 1 Differential Evolution for Molecular Structure Optimization


Require: Initial genes \(\mathbf{R}_x = \{R_{x,0}, \dots, R_{x,n}\} \in \mathbb{R}^n\) , where \(n = \{(p-1) \times 4, 2(p-1)\}\)

Require: Crossover probability \(CP\) , differential weight \(F\)

1: while stopping criteria not met do
2:   Select three different structures  $\mathbf{R}_a, \mathbf{R}_b, \mathbf{R}_c$  such that  $\mathbf{R}_a \neq \mathbf{R}_b \neq \mathbf{R}_c \neq \mathbf{R}_x$ 
3:   Select a random index  $P \in \{0, \dots, n\}$ 
4:   for  $i \in \{0, \dots, n\}$  do
5:     Sample  $r_i \sim U(0, 1)$ 
6:     if  $i = P$  or  $r_i < CP$  then
7:       Set  $R_{y,i} = R_{a,i} + F(R_{b,i} - R_{c,i})$ 
8:     else
9:       Set  $R_{y,i} = R_{x,i}$ 
10:    end if
11:  end for
12:  Construct  $\mathbf{R}_y = \{R_{y,0}, \dots, R_{y,n}\}$ 
13:  if  $f(\mathbf{R}_y) \leq f(\mathbf{R}_x)$  then
14:    Replace  $\mathbf{R}_x$  with  $\mathbf{R}_y$ 
15:  end if
16: end while

2.3.4 Examples

The Gly - Gly dipeptide

b20359eb231b14e1ebcf2beccf3a37ba_img.jpg

A ball-and-stick model of the Gly-Gly dipeptide in its initial conformation. The backbone atoms (carbon, oxygen, nitrogen) are colored green, red, and blue respectively. The side chains (hydrogens) are white. The structure is in a high-energy, energetically unfavourable state.

Initial conformation of Gly-Gly dipeptide

(a) The initial conformation of Gly-Gly. Both oxygen atoms (in red) and nitrogen atoms (in blue) have been manually placed on the same side of the structure, which results into an energetically unfavourable structure.

2fdf14646d51f8b3cfa9c93bbdde7a67_img.jpg

A ball-and-stick model of the Gly-Gly dipeptide in its final conformation, the

\(C_7\)

conformer. The backbone atoms are colored green, red, and blue. The structure is more compact and represents the most stable conformer of the molecule.

Conformation reached at the end of the EA

(b) The conformation reached at the end of the EA. It corresponds to the \(C_7\) conformer, which is known to be the most stable conformer of the Gly-Gly molecule.

Fig. 2.9 Starting and ending points of the DE algorithm for the Gly-Gly molecule.

Glycylglycine is the simplest dipeptide, composed of two glycine amino acids joined by a peptide bond. Glycine itself is the simplest amino acid as it has a single hydrogen atom as its side chain. We chose the glycine dipeptide because it is a molecule of particular interest: it has been found in comets by the NASA’s Stardust spacecraft [160], corroborating the hypothesis of an exogenous origin of life on Earth and the hypothesis of the existence of other forms of life in the Universe [83, 1]. Moreover, the geometry of its two most stable conformers, known as \(C_5\) and \(C_7\) , has been thoroughly characterized both theoretically and experimentally [151].

The EA for the glycine dipeptide has one gene, corresponding to one rotor, which rotates in 3D space the \(N - C_\alpha - C\) triplet of the second peptide plane and all the atoms bonded to it, while keeping the first fixed. We tested both cases with an unconstrained rotor per amino acid pair (i.e. a gene with 4 alleles, the rotors coefficients) and two constrained rotors per amino acid pairs (i.e. a gene with 2 alleles, the folding and twisting angles).

We run the EA with an initial population size of \(5 \times 4 = 20\) individuals for the unconstrained case and \(5 \times 2 = 10\) individuals for the constrained case, randomly generated with Latin hypercube sampling [67] to maximize the coverage of the specified range, and we let it run for 15 iterations. The initial structure for the dipeptide has been arbitrarily picked to have high conformational energy (see Fig. 2.9a).

The rotor spaces explored for two different allele parametrisations are shown in Fig. 2.10. It is not surprising to see how, by limiting the axes of rotation, the space searched is constrained to two unit circles on \(\Sigma\) , helping the algorithm to reach convergence. In the specific case of the Gly-Gly dipeptide, however, as few clashes happen, both approaches yield identical results, converging to the \(C_7\) conformation, shown in Fig. 2.9b.

89b3389b2c4fa1b452cf2b5daa5f82ad_img.jpg

(a) Alleles parametrised as \(\{\alpha, \beta_{12}, \beta_{13}, \beta_{23}\}\) . \(\mathbf{R} = \{R_0\}\) is unconstrained. The blue marker indicates the minimum.

(b) Alleles parametrised as \(\{\theta, \phi\}\) . \(\mathbf{R} = \{R_{\theta,0}, R_{\phi,0}\}\) have fixed rotation axes. Green: \(R_\theta\) , magenta: \(R_\phi\) . Larger markers indicate the minima.

Figure 2.10: Bivector coefficients of R on the unit sphere Σ for the Gly-Gly dipeptide. (a) shows a dense grid of red and blue points on a sphere, with a single blue marker at the bottom indicating the minimum. (b) shows a sphere with two distinct circles of points (green and magenta) and larger markers at the bottom indicating minima.

Fig. 2.10 Bivector coefficients of \(\mathbf{R}\) on the unit sphere \(\Sigma\) for the Gly-Gly dipeptide for each iteration.

The Gly - Phe dipeptide

We then studied the glycine-phenylalanine dipeptide. It has been chosen to verify whether the presence of a larger side chain, such as the benzene ring of the phenylalanine, could impact the quality of the prediction and if our pipeline could break the chirality of the molecule, i.e. its handedness. Just as in the previous case, the initial conformation has been arbitrarily picked to be energetically unfavourable (see Fig. 2.11a), so that the evolution wouldn't be biased by the initial condition. We run the EA with an initial population size of 20 individuals, randomly generated with Latin hypercube sampling to maximize the coverage of the specified range, and we let it run for 15 iterations.

Bivector coefficients are shown in 2.12. As in the previous case, whenever the proposed rotor breaks the molecular geometry, or makes two amino acid clash with

137468e37d54d13b07d012cdae1b69ec_img.jpg

A ball-and-stick model of the Gly-Phe dipeptide in its initial conformation. The molecule is shown in a relatively flat, planar arrangement. Carbon atoms are green, hydrogen atoms are white, nitrogen atoms are blue, and oxygen atoms are red.

Initial conformation of Gly-Phe molecule

(a) The initial conformation of Gly-Phe. The structure is planar.

27ea092429ce71ffe95b24f4babed33a_img.jpg

A ball-and-stick model of the Gly-Phe dipeptide at the end of the Evolutionary Algorithm (EA). The molecule's conformation has changed significantly from the initial planar state, now adopting a more compact, folded shape that resembles the

\(C_7\)

conformation. The chirality of the phenylalanine (Phe) residue's side chain is maintained.

Conformation reached at the end of the EA

(b) The conformation reached at the end of the EA. Gly closes to resemble \(C_7\) , and the chirality of the benzene ring is preserved.

Fig. 2.11 Starting and ending points of the DE algorithm for the Gly-Phe molecule.

022a9e0a9701c3673d8bb68414d6e74e_img.jpg

A 3D plot showing the bivector coefficients of the rotation matrix

\(\mathbf{R}\)

on a unit sphere

\(\Sigma\)

. The sphere is represented by a blue wireframe grid. Two great circles are highlighted: a green one representing the

\(R_{\theta,0}\)

rotation axis and a magenta one representing the

\(R_{\phi,0}\)

rotation axis. Several markers of varying sizes are placed on the sphere's surface, with larger markers indicating the minima of the objective function.

Bivector coefficients on the unit sphere

Fig. 2.12 Bivector coefficients of \(\mathbf{R}\) on the unit sphere \(\Sigma\) for the Gly-Phe dipeptide for each iteration. Alleles are parametrised as \(\{\theta, \phi\}\) . \(\mathbf{R} = \{R_{\theta,0}, R_{\phi,0}\}\) has fixed rotation axes. Green: \(R_{\theta,0}\) , magenta: \(R_{\phi,0}\) . Larger markers indicate the minima.

each other, or breaks a bond, the resulting 3D structure is not fed into xTB and the value of \(f(\mathbf{R})\) is set to an arbitrarily large value, in our case \(10^4\) .

As in the previous case, despite the larger number of iterations, the algorithm converges to a \(C_7\) conformation, which has the lowest energy out of the \(\sim 3000\) conformations explored and hence it is energetically more favourable (see Fig. 2.11b).

Table 2.6 Comparison of rotation representations and convergence speed, expressed as number of iterations of the EA, for Gly-Gly and Gly-Phe dipeptides.

Dipeptide Dihedral Angles Rotor (Unconstrained) Rotor (Constrained)
Gly-Gly \(\phi_1, \psi_1, \omega, \phi_2, \psi_2\) (5 DoF) \(R_0\) (4 DoF) \(R_{0\phi}, R_{0\psi}\) (2 DoF)
Gly-Phe \(\phi_1, \psi_1, \omega, \phi_2, \psi_2, \chi_1\) (6 DoF) \(R_0, R_1\) (4 DoF) \(R_{0\phi}, R_{0\psi}\) (2 DoF)
Convergence 1020 (Gly-Gly), diverging (Gly-Phe) 860 (Gly-Gly), diverging (Gly-Phe) 200 (Gly-Gly), 2550 (Gly-Phe)

Table 2.6 summarizes the comparison. Rotor-based representations offer a competitive alternative to dihedral-angle parametrizations, which involve a higher number of degrees of freedom and require sequential rotations about distinct local axes. Although dimensionality can be reduced through constrained dihedral formulations, rotor-based approaches benefit from an efficient implementation via the sandwich product, resulting in faster convergence and reduced computational overhead during rotational updates.

2.4 Conclusions

In this chapter we provided two examples in which rotors are used to parametrise rotations in two different learning problems in an effective and compact way.

In Section 2.2 we have proven that certain representations of rotations might be more suitable than others according to the specific task to be solved. We have also showed how a fully GA description of the rotation allows us to bypass the continuity issue, yielding results which are comparable, if not superior, to the continuous 6D representation without needing a novel representation in all three toy problems presented. We believe that the errors introduced when learning rotations are due to the specific rotation matrix-to-representation mapping, and not an intrinsic limitation of the representation itself: for this reason, we will make rotors and bivectors our representation of choice, since they are readily defined, more compact and show enhanced robustness to noise.

In Section 2.3 we have demonstrated how rotors can be employed as a metaheuristic for global geometry optimisation of small molecules. We showed how to model peptides in 3D as a collections of rotors, that have fewer degrees of freedom compared to dihedral angles, and how they can be used to drive an evolutionary algorithm to reach stable conformers in fewer iterations.

Chapter 3

Protein Structure Prediction (PSP)

How do you know that the side you
are used to is better than the one to
come?


The Forty Rules of Love, Elif
Shafak (2009)

3.1 Introduction

In this chapter we focus on the modelling of proteins and on the estimation of their 3D structure. The 3D structure of a protein - known as its tertiary structure - is the arrangement in space of its amino acid chain. It determines the protein functionality, which is important in drug design and discovery, disease understanding and treatment and more. Determining the structure experimentally, however, is expensive and time consuming, and researchers have been employing computational methods instead, hence the field of protein structure prediction (PSP). Here we explore the advantage of using GA in PSP-related problems.

We first use GA as a modelling tool, to generate a novel set of features that capture information about the proteins local folding (its secondary structure) more effectively than commonly used dihedral angles.

We then use GA as a computational tool, to construct an equivariant neural network in Clifford Algebra that improves on the the accuracy of PSP pipelines over linear layers.

3.2 Protein structures as 3D CGA planes

As anticipated in Section 2.3, a protein can be simplified into a backbone chain and several side chains. The backbone is mostly responsible for the 3D shape of the protein, and it is composed of a sequence of carbon, nitrogen, and oxygen atoms. The \(\alpha\) -carbons are the main feature of the backbone, to which the side chains that differentiate each amino acid are bonded. Each \(\alpha\) -carbon is preceded by a nitrogen atom and followed by a carbon atom. Hence, to each amino acid \(i\) we can associate a triplet of atoms \(\{N, C_\alpha, C\}_i\) .

Each \(\{N, C_\alpha, C\}\) triplet lies on a plane, constraining the protein folding (see Figure 3.1). We can conveniently model a protein backbone in CGA so any three \(\{N, C_\alpha, C\}\) atoms will lie on a plane (not too dissimilar to the residue gas of [81]): let \(A_i, B_i\) and \(C_i\) be the Euclidean coordinates expressed in Conformal space of the atoms \(\{N, C_\alpha, C\}_i\) , respectively. The plane associated with residue \(i\) can be expressed as the 4-blade:

\[\Pi_i = A_i \wedge B_i \wedge C_i \wedge n_\infty \quad (3.1)\]

Given two planes \(\Pi_i, \Pi_j\) corresponding to the amino acids \(i, j\) , we can compute the rotor that brings one to the other as described in [102]:

\[R_{ij} = \frac{1}{\sqrt{\langle K \rangle_0}} (1 - \Pi_i \Pi_j) \quad (3.2)\]

where \(K = 2 - (\Pi_i \Pi_j + \Pi_j \Pi_i)\) and \(\langle \cdot \rangle\) is the grade projector operator. We now use the cost function \(C_\lambda(R)\) that measures how much the rotor \(R\) varies from the identity, as defined in [46]. \(C_\lambda(R)\) is a weighted sum of a translational and a rotational term:

\[C_{\lambda_1 \lambda_2}(R) = \lambda_1 \langle R_{\parallel} \tilde{R}_{\parallel} \rangle_0 + \lambda_2 \langle (R_{\perp} - 1)(\tilde{R}_{\perp} - 1) \rangle_0 \quad (3.3)\]

in which the translational error is represented by \(R_{\parallel} = R \cdot e\) , the rotational error by \(\langle (R_{\perp} - 1)(\tilde{R}_{\perp} - 1) \rangle_0 = \langle (R - 1)(\tilde{R} - 1) \rangle_0\) and \(\lambda_1, \lambda_2 \in \mathbb{R}\) are two weighting scalars. As we are interested in a orientational features, we will focus exclusively on the rotational part (case \(\lambda_1 = 0, \lambda_2 = 1\) ).

0a5a65d0e03abb559a2790b6490a6f55_img.jpg

Figure 3.1: HIV-1 protease (ID: 1dmp) modelled as a collection of N - C_alpha - C planes. The left panel shows the full protein structure as a collection of red planes in a 3D coordinate system (x, y, z). The right panel is a close-up view of the first 20 residues, showing two specific planes, Pi_1 and Pi_j, and the rotation R_ij between them.

Fig. 3.1 HIV-1 protease (ID: 1dmp) modelled as a collection of \(N - C_\alpha - C\) planes (left) and close up on the first 20 residues (right). Note \(\Pi_j = R_{ij}\Pi_i\tilde{R}_{ij}\) .

3.2.1 Geometrical interpretation of the cost function

A rigorous interpretation of the cost function is derived in [46]-[184]. A rotor in 3D conformal space can be represented via an exponential:

\[R = e^B \quad (3.4)\]

in which \(B\) is a bivector. Assuming \(B = \phi P + tn\) , it is derived in [185] that

\[R = e^B = [\cos(\phi) + P \sin(\phi)][1 + t_\perp n_\infty] + \text{sinc}(\phi)t_\parallel n_\infty \quad (3.5)\]

where \(P\) is a bivector, representing the normalized rotation plane, such that \(P^2 = -1\) , \(2\phi\) is the rotation in plane \(P\) and \(t \in \mathbb{R}^3\) is the translation vector. By grouping the components including and not including \(n_\infty\) , we can rewrite Eq. 3.5 as the sum of a perpendicular and a parallel component of the rotor:

\[\begin{aligned} R &= R_\perp + R_\parallel n_\infty \\ R_\perp &= \cos(\phi) + P \sin(\phi) \\ R_\parallel &= \cos(\phi)t_\perp + \text{sinc}(\phi)t_\parallel + \sin(\phi)t_\perp \end{aligned} \quad (3.6)\]

In the given decomposition, \(R_\perp\) represents the rotational component of the rotor, as it consists of terms independent of \(n_\infty\) and describes a pure rotation in the conformal model. In contrast, \(R_\parallel\) captures the translational effects, as it includes terms proportional to \(n_\infty\) , which in CGA encode translations in space.

We can then rewrite the rotational component of Eq. 3.3 as:

\[\begin{aligned} \langle (R_{\perp} - 1)(\tilde{R}_{\perp} - 1) \rangle_0 &= \langle R_{\perp} \tilde{R}_{\perp} - R_{\perp} - \tilde{R}_{\perp} + 1 \rangle_0 = \\ \langle 1 - \cos(\phi) - P \sin(\phi) - \cos(\phi) + P \sin(\phi) - \cos(\phi) + 1 \rangle_0 &= \quad (3.7) \\ 2(1 - \cos(\phi)) &= 4 \sin^2(\phi/2) \end{aligned}\]

The expression of the rotational error is hence a convex function exclusively dependent on the rotation angle \(\phi\) between the two planes, with a maximum of 2 for a rotation of \(2\phi = \pm\pi\) and a minimum at \(\phi = 0\) . This shows that the cost function of Eq. 3.3 can be used as a measure of the relative orientation of amino acids within a protein chain.

3.3 Cost maps: a geometric description of amino acids

Inter-residue interactions are commonly represented as matrices - also called maps. A contact map C of a protein consisting of \(M\) residues, for example, is a binary \(M \times M\) matrix of the type:

\[C_{ij} = \begin{cases} 1 & \text{if } d_{ij} < 15 \text{ \AA} \\ 0 & \text{otherwise} \end{cases} \quad (3.8)\]

where \(d_{ij}\) is the distance between residues \(i, j\) expressed in \(\text{\AA}\) measured as the Euclidean distance between the \(C_{\alpha}\) coordinates of residues \(i\) and \(j\) . The threshold of \(15 \text{ \AA}\) (in which \(1 \text{ Angstrom} = 10^{-10} \text{ m}\) ) has been picked as an arbitrary reference to define two amino acids in contact with each other; a threshold of 10 or \(8 \text{ \AA}\) would yield similar results. A contact map can hence be interpreted as: two residues are in contact if they are within a certain distance from each other. A more informative metric, usually real-valued, is given by distance maps, which are similarly defined as:

\[D_{ij} = d_{ij} \quad (3.9)\]

From either or both contact and distance maps it is possible to obtain accurate 3D shape estimation. However, when contact or distance maps are predicted and not exact, errors are introduced into the 3D reconstruction step. Having an additional map capturing the orientation between residues can help to further constrain the search space for the protein folding. We can hence employ our cost function to produce a cost map which contains orientational information as follows:

\[M_{ij} = \begin{cases} C_{\lambda_1 \lambda_2}(R_{ij}) & \text{if } d_{ij} < 15 \text{ \AA} \\ 0 & \text{otherwise} \end{cases} \quad (3.10)\]

A comparison between contact map C, distance map D and cost map M is given in Figure 3.2 for an example protein. Note how the range of our cost maps is \([0, 2]\) , which is consistent with Eq. 3.7.

19e2fdfdf050c05ce74925211913c2c3_img.jpg

Figure 3.2: Comparison of contact, distance, and cost maps for protein chain 2HC5A. (a) Contact map showing binary values (0 or 1) with a color scale from 0.0 to 1.0. (b) Distance map showing continuous values with a color scale from 0 to 40. (c) Cost map showing continuous values with a color scale from 0.00 to 1.75. All maps are 100x100 matrices.

Fig. 3.2 (a) contact, (b) distance and (c) cost map for protein chain 2HC5A.

3.3.1 Cost maps and secondary structures

It is possible to establish a relationship between patterns in cost maps and the protein secondary structure. By secondary structure we refer to the local folding of a segment of a protein, e.g. \(\alpha\) -helices, \(\beta\) -sheets or turns. Secondary structure information is a common feature in PSP pipelines and one of the most important in predicting distance and contact maps, as shown in [38, 2].

By assigning a colour to each secondary structure, it is possible visualize the secondary structure of each amino acid pair. We arbitrarily assigned red to \(\alpha\) -helices, green to \(\beta\) -sheets, blue to turns and white to all the others. Any combination of these four colours gives the possible secondary structures of the pair, for a total of 10 different colour combinations. As shown in Figure 3.3, it is possible to find a clear correspondence between secondary structures and patterns in the cost maps; for example, \(\alpha\) -helices patterns include parallel lines, equally spaced with low intensity, while \(\beta\) -sheets present high intensity pixels packed tightly together. These patterns are identical and consistent for different proteins, as shown in Figure 3.4. Note also how patterns between residues which are in contact with each other but belonging to

7b692a27af8e1a2533b06dc024e9db5c_img.jpg

(a)

(b)

Figure 3.3: (a) Cost map and (b) secondary structures for protein chain 4JZK. (a) shows a cost map with a color scale from 0 to 175 on both axes. (b) shows secondary structures with a color scale from 0 to 200 on both axes.

Fig. 3.3 (a) Cost map and (b) secondary structures for protein chain 4JZK. Visualizing the first 200 residues. Red: \(\alpha\) -helices, green: \(\beta\) -sheets, blue: hydrogen bond turns, white: all the others.

a0f1c3f5f43cd5623462acd915d5ce76_img.jpg

3B43

6J48

1G03

2HC5

1K20

1K20

Figure 3.4: Examples of (zoomed-in) cost maps for different proteins along with their secondary structures. The figure shows six pairs of images for proteins 3B43, 6J48, 1G03, 2HC5, 1K20, and 1K20. Each pair consists of a cost map on the left and a secondary structure map on the right.

Fig. 3.4 Examples of (zoomed-in) cost maps for different proteins along with their secondary structures. Patterns associated to a same secondary structure are consistent for different protein chains.

different secondary structures are consistent, e.g. helix-sheet interactions (in red-green colour), in the off-diagonal areas, present typical braided patterns.

To the best of our knowledge, this is the first example of an orientational map that also encodes the secondary structure of the protein so clearly: the three angle maps presented in [197] obtained via dihedral angles of the protein chains, which are also an orientational measure, do not present patterns as evident as those of our cost maps.

3.3.2 Cost maps and different protein conformations

Proteins structures are not static. A protein can display different conformations, or “states”, based on their activity - an example of this is given by enzymes, i.e. proteins that catalyze chemical reactions and can change their conformation when they bind to different substrates. We will then have an “open” state, when the enzyme is ready to bind to the substrate, and the “closed” state, when the enzyme is inactive.

For example, the Lysine-, Arginine-, Ornithine-binding protein has an open (PDB ID: 2LAO) state and a closed (PDB ID: 1LAF) state [164]. Chains 1LAF and 2LAO are shown in Figure 3.5. The corresponding cost maps are given in Figure 3.6. In Figure 3.6c we see that that a higher activity is associated with the second half of the protein, approximately from residues 80 to 200: this is mirrored in the 3D structure, for which the cyan and magenta chains diverge more for said residues (see Figure 3.5). Note also the red regions in Figure 3.6c: this means that the chain of Figure 3.6a is folded in such a way that residues in the region 50-150 are in contact (i.e. at a distance below 15 Å), differently from the chain of Figure 3.6b.

8a81bb652afa56d2872e995ad32f9887_img.jpg

A 3D ribbon diagram showing two protein conformations, 1LAF (cyan) and 2LAO (magenta), overlaid. The structures show significant conformational differences, particularly in the open/closed states. The cyan chain (1LAF) is more compact and folded, while the magenta chain (2LAO) is more extended and open. The divergence is most prominent in the middle region of the protein chains.

3D ribbon diagram showing two protein conformations, 1LAF (cyan) and 2LAO (magenta), overlaid. The structures show significant conformational differences, particularly in the open/closed states.

Fig. 3.5 3D structures for chains 1LAF (cyan) and 2LAO (magenta).

Another example of different protein conformation is given by the spike proteins of viruses: the open (PDB ID: 6VYB) and closed (PDB ID: 6VXX) states of the

366613102f1e34db1de83df8d2115642_img.jpg

Figure 3.6: Cost maps of the first 300 residues of protein chains (a) 1LAF, (b) 2LAO and (c) their difference. The figure consists of three heatmaps. (a) and (b) show cost maps for chains 1LAF and 2LAO respectively, with axes from 0 to 200. (c) shows the difference map with a color scale from -0.05 (blue) to 0.05 (red).

Fig. 3.6 Cost maps of the first 300 residues of protein chains (a) 1LAF, (b) 2LAO and (c) their difference.

SARS-CoV-2 S glycoprotein are shown in Figure 3.7 [121]. The corresponding cost maps are given in Figure 3.8, with Figure 3.8c showing the difference between the two. Note how the two structures are consistent with each other, with highest deviation around residues 300-400, which correspond to the folding coloured in orange of chain 6VYB in the bottom-left part of Figure 3.7, which is absent in chain 6VXX. Note also how these differences would be very hard to spot by considering Figures 3.8a-b individually.

4ffda59da7ddc932f037693d3d393e5f_img.jpg

Figure 3.7: 3D structures for chains 6VXX (blue) and 6VYB (orange). The image shows two ribbon diagrams of protein structures, one in blue and one in orange, overlaid. The structures are complex, with many alpha-helices and beta-sheets. The orange structure (6VYB) has a distinct folding region in the bottom-left that is absent in the blue structure (6VXX).

Fig. 3.7 3D structures for chains 6VXX (blue) and 6VYB (orange).

0074c6f00020bf1c56b95a37a7a9e599_img.jpg

Figure 3.8: Cost maps of the first 500 residues of protein chains (a) 6VXX, (b) 6VYB and (c) their difference. The figure consists of three heatmaps labeled (a), (b), and (c). Each heatmap has x and y axes ranging from 0 to 400. A color bar on the right indicates values from -0.4 (blue) to 0.4 (red). Heatmaps (a) and (b) show a dark blue background with a magenta/purple diagonal band and some scattered points. Heatmap (c) shows a light pink background with scattered magenta/purple points.

Fig. 3.8 Cost maps of the first 500 residues of protein chains (a) 6VXX, (b) 6VYB and (c) their difference.

3.3.3 Cost maps and NMR ensembles

0e80cc1fa0094cc0935c325a44e12545_img.jpg

Figure 3.9: Three different models within the NMR ensemble of protein chain 1BVM with the complete ensemble. The figure shows four ribbon diagrams of a protein structure. The first three are individual models, and the fourth is the complete ensemble. The fourth diagram has two green boxes highlighting specific regions of the protein structure.

Fig. 3.9 Three different models within the NMR ensemble of protein chain 1BVM with the complete ensemble.

Nuclear magnetic resonance (NMR) is a technique in analytical chemistry to determine the purity or the structure of a sample. By NMR ensemble we refer to a collection of possible foldings of a protein, which, being flexible, does not possess a unique conformation. An NMR ensemble is hence a collection of different foldings of the same protein, with some small local differences. An average difference of the cost maps associated with multiple structures resolved via NMR can be obtained. The average difference is obtained in 3 steps:

  • Compute the cost map for each structure in the ensemble, for a total of \(K\) structures and hence \(K\) cost maps.

  • Take the difference between consecutive pairs of cost maps, obtaining \((K - 1)\) differences of cost maps. This approach is taken because consecutive cost maps exhibit smaller divergences from each other.

  • Take the overall average by summing the differences of cost maps and divide the resulting plot by \((K - 1)\) pixel-wise.

Regions with higher intensity will appear in the final plot, meaning that those regions were the most different across the different 3D structures. This highlights residues or regions associated with a higher resolution error or with higher flexibility.

We verified this with an NMR ensemble of 20 chains of protein 1BVM (see Figure 3.9: averaging the pairwise differences for each of the 20 chains, we end up with the plot in Figure 3.10. The higher activity around residues 10-50 (corresponding to the green box in Figures 3.9-3.10) or 50-60 (corresponding to the cyan box in Figures 3.9-3.10) signifies that those residues are associated with a higher uncertainty.

Note how, on the contrary, the helix structures are well resolved, which is clear both from Figure 3.9 and from Figure 3.10d, in which residues 30-60 or 90-110, corresponding to helices, have an average cost map difference close to 0. Note also how it is possible to spot the helix structures immediately from Figure 3.10a-b-c.

46354e78fc94fa64a2d49f26a2a24c80_img.jpg

Figure 3.10 consists of four panels labeled (a), (b), (c), and (d). Panels (a), (b), and (c) are cost maps for protein 1BVM, each showing a 2D grid of residues. Each map has a dark background with bright, diagonal, and block-like patterns. Three colored boxes (green, cyan, and yellow) are overlaid on each map, highlighting specific regions of interest. Panel (d) is the average difference of the 20 cost maps. It shows a similar 2D grid but with a color scale on the right ranging from -0.20 (dark blue) to 0.20 (yellow). The map is predominantly dark blue, indicating low average difference, with some bright spots corresponding to the regions highlighted in the other panels.

Figure 3.10: Four cost maps (a, b, c, d) for protein 1BVM. (a), (b), and (c) show individual cost maps with colored boxes highlighting specific regions. (d) shows the average difference of the 20 cost maps with a color scale from -0.20 to 0.20.

Fig. 3.10 (a)-(c): Three cost maps associated with 3 resolved structures of the NMR ensemble of chain 1BVM. Note the differences in pattern among the three maps highlighted by the coloured boxes. (d): Average difference of the 20 cost maps. The higher activity in the averaged map can be associated to higher resolution uncertainty or variability in the structure.

3.4 Alternative representations

We are now going to introduce two more maps describing the orientation between pairs of amino acids.

3.4.1 Dihedral angles

Dihedral angles are the most common way to describe the orientation of atoms in the protein backbone. They are the angles between planes formed by different atom triplets in the backbone. Commonly, each amino acid \(i\) can be described with three dihedral angles \(\{\phi_i, \psi_i, \omega_i\}\) .

97e8ba855b4ea537d84f0c782153ec54_img.jpg

Ball-and-stick model of Insulin (ID: 3i40) on the left and a close-up of the backbone in position i on the right. The close-up shows the backbone atoms: C_alpha,i+1, N_i+1, C_i, C_alpha,i, C_beta,i, N_i, and C_i-1. The dihedral angles are highlighted: omega_i (green) between N_i+1-C_i-C_alpha,i and C_i-N_i+1-C_alpha,i+1; phi_i (pink) between C_beta,i-C_alpha,i-C_i and N_i-C_alpha,i-C_i; and psi_i (blue) between C_i-1-N_i-C_alpha,i and N_i-C_alpha,i-C_i.

Fig. 3.11 Ball-and-stick model of Insulin (ID: 3i40, left) and close up on the backbone in position \(i\) with dihedral angles \(\omega_i, \phi_i, \psi_i\) highlighted (right).

Following the convention in [27], the angle \(\psi_i\) is defined as the dihedral angle between the \(C_{i-1} - N_i - C_{\alpha,i}\) and the \(N_i - C_{\alpha,i} - C_i\) planes, the angle \(\phi_i\) as the dihedral angle between the \(C_{\beta,i} - C_{\alpha,i} - C_i\) and the \(N_i - C_{\alpha,i} - C_i\) planes, while \(\omega_i\) is the dihedral angle between the \(N_{i+1} - C_i - C_{\alpha,i}\) and the \(C_i - N_{i+1} - C_{\alpha,i+1}\) planes (see Fig. 3.11). A total of 5 planes, all relative to residue \(i\) , are involved to specify the triplet of dihedral angles \(\{\phi_i, \psi_i, \omega_i\}\) .

In order to build angle maps, we need dihedral angles between pairs of residues. We will hence define \(\{\phi_{ij}, \psi_{ij}, \omega_{ij}\}\) as the dihedral angles between the same two pair planes mentioned above, but with the first plane relative to residue \(i\) and the second relative to residue \(j\) . With this assumption we have that \(\phi_i = \phi_{ii}, \psi_i = \psi_{ii}\) and \(\omega_i = \omega_{ii}\) . We can hence define the angle maps as follows:

\[\Phi_{ij} = \begin{cases} \phi_{ij} & \text{if } d_{ij} < 15 \text{ \AA} \\ 0 & \text{otherwise} \end{cases} \quad (3.11)\]

\[\Psi_{ij} = \begin{cases} \psi_{ij} & \text{if } d_{ij} < 15 \text{ \AA} \\ 0 & \text{otherwise} \end{cases} \quad (3.12)\]

d9390ad6cde8bd9e9ed97b84dabf08bc_img.jpg

(a) \(\Phi\) (b) \(\Psi\) (c) \(\Omega\)

Figure 3.12: Angle maps for protein 12asA. (a) Φ map, (b) Ψ map, (c) Ω map. Each map is a 300x300 heatmap with a color scale from 0.0 (dark blue) to 3.0 (yellow). The Ψ map shows a distinct asymmetry compared to the others.

Fig. 3.12 Angle maps for protein 12asA.

\[\Omega_{ij} = \begin{cases} \omega_{ij} & \text{if } d_{ij} < 15 \text{ \AA} \\ 0 & \text{otherwise} \end{cases} \quad (3.13)\]

where \(d_{ij}\) is the Euclidean distance between the \(C_\alpha\) 's of residues \(i, j\) measured in Angstroms ( \(1 \text{ \AA} = 10^{-10} \text{ m}\) ). The cut-off distance is required to discriminate which amino acids are in contact with each other, and \(15 \text{ \AA}\) is a recurrent value in the literature; a value of 10 or \(8 \text{ \AA}\) could have also been picked, yielding slightly thinner profiles in the maps. Examples of angle maps are given in Figure 3.12. Note how the patterns in \(\Psi\) are asymmetric.

3.4.2 Dot product between oriented points

Oriented points were first introduced in [44] with the name of “tangent bivectors”. In [69], an oriented point \(Q\) in CGA is a trivector, i.e., a circle, with radius \(r = 0\) , defined as

\[Q = I_q \wedge q + \left[ \frac{1}{2} q^2 I_q - q(q \cdot I_q) \right] n_\infty + I_q n_0 + (I_q \cdot q) E. \quad (3.14)\]

In the equation above \(q \in \mathbb{R}^3\) is the 3D position vector defining the centre of \(Q\) , \(I_q\) is the (unit) bivector corresponding to the oriented plane in which \(Q\) lies (orthogonal to the normal \(n_q\) of the plane) and \(E = n_\infty \wedge n_0\) is the origin-infinity bivector.

Given a pair of oriented points \(P, Q\) , the dot product between them was first studied in [71], and it was derived to be equal to

\[P \cdot Q = d^2 \left[ -\frac{1}{2} \cos \alpha_{pq} + \cos \Theta_q \cos \Theta_p \right], \quad (3.15)\]

in which \(d\) is the Euclidean distance between \(p, q \in \mathbb{R}^3\) , \(\alpha_{pq}\) is the dihedral angle between the two planes, \(\cos \alpha_{pq} = n_q \cdot n_q\) , \(\cos \Theta_p = d \cdot n_p\) and \(\cos \Theta_q = d \cdot n_q\) .

Equation 3.15 suggests that, up to a scale factor, the dot product between two oriented points is a function of three angles, hence it encodes the orientation between them. Moreover, similarly to the cost function of Eq. 3.10, Equation 3.15 allows us to obtain a scalar measure from an initial GA description.

As each atom in the backbone can be associated with a set of 3D coordinates and it lies on a plane specified by the bonds to the previous and following atoms in the backbone, it is easy to associate an atom with an oriented point. We will call \(Q_{C_\alpha}\) the oriented point centered on \(C_\alpha\) in the \(N - C_\alpha - C\) plane (see Fig. 3.13), \(Q_{C_\beta}\) the oriented point centered on \(C_\beta\) in the \(C_\alpha - C_\beta - C\) plane and so on.

37799fdee77edb30f545ffbb6968c8ad_img.jpg

The figure consists of two panels. The left panel shows a 3D perspective view of a protein structure, Haemoglobin (ID: 1a3n), represented as a dense collection of blue oriented points. The points are clustered together, forming the shape of the protein. The right panel is a close-up view of five residues, showing the oriented points as blue ellipses on a plane. Two specific oriented points are labeled: \(Q_{C_\alpha,i}\) (in red) and \(Q_{C_\alpha,j}\) (in green). Normal vectors \(n_{q,i}\) (red arrow) and \(n_{q,j}\) (green arrow) are shown originating from these points. A dashed line connects the two points, representing the vector \(d\) between them. The axes are labeled x, y, and z.

Figure 3.13: Haemoglobin (ID: 1a3n) modelled as a collection of oriented points centered at C_alpha (left) and close up on 5 residues with labelled oriented point (right).

Fig. 3.13 Haemoglobin (ID: 1a3n) modelled as a collection of oriented points centered at \(C_\alpha\) (left) and close up on 5 residues with labelled oriented point (right).

We can again build an \(M \times M\) map for each protein chain of length \(M\) , which we will call dot product maps, defined as:

\[N_{X,ij} = \begin{cases} \frac{1}{d_{ij}^2} (Q_{X,i} \cdot Q_{X,j}) & \text{if } d_{ij} < 15 \text{ \AA} \\ 0 & \text{otherwise} \end{cases} \quad (3.16)\]

In the equation above, \(X\) is any atom among \(C_\alpha, C_\beta, C\) or \(N\) , and \(d_{ij}\) is the Euclidean distance between \(X_i\) and \(X_j\) . Note the normalization factor as we are only interested in the orientation between atoms. Examples of dot product maps are given in Figure 3.14.

34cfb1835fdf0d248bc2c4d58956a896_img.jpg

(a) \(\mathbf{N}_{C_\alpha}\) (b) \(\mathbf{N}_{C_\beta}\) (c) \(\mathbf{N}_N\)

Figure 3.14: Dot product maps for protein 3i41A. (a) N_{C_\alpha} map showing a diagonal band of high values. (b) N_{C_\beta} map showing a more complex pattern of high values. (c) N_N map showing a pattern of high values. Each map has a color bar on the right indicating values from -0.4 to 0.4.

Fig. 3.14 Dot product maps for protein 3i41A.

3.5 Dataset: PDNET

We employed an expanded version of PDNET [2] to train our models in Section 3.7 and Section 3.8. The original PDNET includes 1000 proteins for training and 150 proteins for testing, of which the top seven features in contact and distance prediction problems are picked. These features are sequence profiles, secondary structures, solvent accessibility, coevolutionary signals, FreeContact [84], contact potentials alignments and Shannon entropy, all predicted starting from the amino acid sequence. The seven features are encoded in PDNET as a stack of \(57 M \times M\) channels for each of its protein chains of length \(M\) . Of the 57 channels, 3 of them correspond to pairwise features, i.e., FreeContact, coevolutionary signals and contact potentials, while the remaining 54 are 27 individual features repeated twice as a matrix and their corresponding transpose (i.e., \(Y\) and \(Y^T\) ).

We expanded PDNET by adding distance maps (defined as \(\mathbf{D}_{ij} = d_{ij}\) , where \(d_{ij} = \|T_i - T_j\|_2\) , with \(T \in \mathbb{R}^{N \times 3}\) being the ground truth coordinates of the \(C_\alpha\) atoms of the protein) and eight possible combinations of the cost maps (namely \(\mathbf{M}_{C_\alpha}, \mathbf{M}_{C_\beta}, \mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}, \mathbf{N}_N, \Omega, \Phi\) and \(\Psi\) ), for a total of nine different cases (see Table 3.1).

We then recast the dataset in the form of heterogeneous graphs so that it can be input into the Graph Transformer (GrT, see Section 5.2.1). By heterogeneous graph \(\mathcal{G}(V, E)\) with \(V\) and \(E\) being its set of nodes and edges, respectively, we denote a graph with different types of nodes and edges. If \(|V| = M\) is the total number of nodes, the graph can be described as a set of adjacency matrices for each of the \(K\) edge types. Adjacency matrices are encoded as \(\{A_k\}_{k=1}^K\) , where \(A_k \in \mathbb{R}^{M \times M}\) , or in tensor form \(\mathbf{A} \in \mathbb{R}^{M \times M \times K}\) : we say that \(A_{k;i,j}\) is non-zero when there exists an edge of type \(k\)

Table 3.1 Combinations of orientational features. The column “Planes” specifies how many planes are required to build the corresponding set of features. The column “ \(K\) ” indicates the total number of adjacency matrices of the graph.

Case Additional Features Planes \(K\)
(a) none - 4
(b) \(\mathbf{M}_{C_\alpha}\) 1 5
(c) \(\mathbf{M}_{C_\alpha}, \mathbf{M}_{C_\beta}\) 2 6
(d) \(\mathbf{N}_{C_\alpha}\) 1 5
(e) \(\mathbf{M}_{C_\alpha}, \mathbf{N}_{C_\alpha}\) 2 6
(f) \(\mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}\) 2 6
(g) \(\mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}, \mathbf{N}_N\) 3 7
(h) \(\mathbf{M}_{C_\alpha}, \mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}\) 3 7
(i) \(\Omega, \Phi, \Psi\) 5 7

between nodes \(i, j\) . Along with \(\mathbf{A}\) , we can also define a feature matrix \(X \in \mathbb{R}^{M \times D}\) , where \(D\) is the dimensionality of the features, or equivalently we can say there are \(D\) node types.

Ignoring the transposed matrices, we have \(D = 27\) channels which correspond to features relative to individual amino acids which can be manipulated and arranged in a feature matrix \(X \in \mathbb{R}^{M \times D=27}\) . On the other hand, the pairwise features range from a minimum of \(K = 4\) in case (a) up to a maximum of \(K = 7\) in cases (g)-(i) which correspond to the edges of the protein graph, i.e., the adjacency matrices \(\mathbf{A} \in \mathbb{R}^{M \times M \times K}\) (see Table 3.1).

The input to the Graph Transformer is then given by the pair of tensors \(\{\mathbf{A}, X\}^{(i)}\) for each protein \(i\) in the dataset.

3.6 Estimating orientational features

572cf014db14c8234f91c07d5a5a28c6_img.jpg

Diagram illustrating the architecture for predicting orientational features. The input consists of feature channels of size M x M x L. This is processed through a sequence of layers: Batch Norm, ReLU, Conv 3x3 (64 channels), Dropout, ReLU, and Conv 3x3 (L channels). A skip connection bypasses these layers and adds the input. The output is then processed by a final ReLU / tanh layer to produce the predicted orientational feature.

Fig. 3.15 Predicting orientational features. We employed the residual neural network of [2] to predict (a) cost maps \(\mathbf{M}_\alpha\) , (b) dot product maps \(\mathbf{N}_\alpha\) and (c) angle maps \(\Phi, \Psi, \Omega\) . While all (true) orientational features provide similar improvements when predicting coordinates, not all of them are as readily predicted.

Results presented in Section 3.8 onwards assume that the orientational features are built starting from ground truth coordinates of atoms in the protein. In a realistic

scenario, however, the coordinates are the end goal of the pipeline, not the starting point. We would then have to employ predicted orientational features, which are less clear than the ground truth features shown so far and must themselves be estimated from other features or multiple sequence alignments (MSA) of the amino acid sequence.

To this end, we employed the PDNET pipeline of [2], originally designed to predict distance maps, to compare how easily different orientational maps are predicted starting from the same set of features and the same pipeline. We attempted to predict cost maps \(\mathbf{M}_{C_\alpha}\) , dot product maps \(\mathbf{N}_{C_\alpha}\) and the three angle maps. The prediction pipeline is shown in Figure 3.15.

Example of predictions are given in Figures 3.16-3.17: cost maps \(\mathbf{M}\) were consistently more accurately predictable. This is likely because they are closely related to secondary structure information, which is implicitly included in other input features. Less accurate are the predictions of angle maps, which don't present easily recognizable patterns or are asymmetric as in \(\Psi\) . A similar consideration can be made for dot product maps.

While all true orientational maps increase the coordinate prediction accuracy in a similar way, not all orientational maps are readily predicted, meaning that in a realistic PSP pipeline we will prefer features that are easier to estimate for the same added value in terms of GDT scores.

36f66d23ff319c73247feee55fd2dfb1_img.jpg

Figure 3.16: Ground truth and predicted orientational maps for protein 1a3aA. The figure is a 2x5 grid of heatmaps. The columns are labeled with mathematical symbols: φ, Ψ, Ω, N, and M. The rows are labeled 'Ground Truth' and 'Prediction'. The heatmaps show patterns of varying complexity and symmetry. Ground truth maps for φ, Ω, and M show clear, symmetric patterns. Ground truth for Ψ is asymmetric. Ground truth for N is more complex and noisy. Predicted maps for φ, Ω, and M are very similar to their ground truth counterparts. Predicted Ψ is more symmetric than ground truth Ψ. Predicted N is significantly less accurate, showing a diagonal band of high values instead of the complex ground truth pattern. Predicted M remains relatively accurate.

Fig. 3.16 Ground truth and predicted orientational maps for protein 1a3aA.

3.6.1 Training details

The loss of the PDNET pipeline has been chosen to be

\[\mathcal{L}_{\mathbf{X}} = \log(\cosh(\mathbf{X}_P - \mathbf{X}_T)) \quad (3.17)\]

423014446ca14cda32b393f915782615_img.jpg

Figure 3.17: Ground truth and predicted orientational maps for protein 1a70A. The figure shows two rows of five heatmaps labeled with symbols: Φ, Ψ, Ω, N, and M. The top row is labeled 'Ground Truth' and the bottom row is labeled 'Prediction'. Each heatmap shows a complex pattern of colors (blue, orange, yellow) representing orientational information across a protein structure.

Fig. 3.17 Ground truth and predicted orientational maps for protein 1a70A.

where \(\mathbf{X}_P, \mathbf{X}_T\) are the predicted and true orientational maps in the training set, respectively, with \(\mathbf{X} = \{\mathbf{M}, \mathbf{N}, \Phi, \Psi, \Omega\}\) . The only difference between the approaches is the activation function of the last layer: when predicting \(\mathbf{N}\) , which ranges between \([-0.5, 0.5]\) , we employed a tanh activation instead of a ReLU.

The training features are identical to those of PDNET, namely a stack of images of the type \(\{\mathbf{Y}^{(i)}\}_{i=1}^N\) , with \(L = 57\) and \(\mathbf{Y}^{(i)} \in \mathbb{R}^{M \times M}\) , in which \(M\) is the length of the protein sequence. The loss is evaluated per pixel. The training set has been kept to 1000 proteins from the DEEPCOV dataset, and the testing set to 150 proteins from the PSICOV dataset, as in the original PDNET pipeline.

3.7 PSP with GA features

3.7.1 Architecture: Graph Transformer + 3D projector

The end-to-end architecture, derived from [36], is composed of two parts: (1) a Graph Transformer and (2) a 3D projector. A summary of the architecture is shown in Fig. 3.18. We omitted the MSA Transformer of [36] as the employed dataset allows us to directly perform node and edge embedding on its features.

e5293502d8d5885d71defe1150aa12fc_img.jpg

Figure 3.18: The employed architecture diagram. It shows the flow from input features to 3D coordinates. Input: 'feature channels' (represented by three stacked rectangles) and 'orientational features' (represented by a stack of M x M matrices). These are combined via '+' and then processed through 'node embedding' (X, M x D) and 'edge embedding' (A, M x M x K). The embeddings are fed into a 'Graph Transformer' block. The output is a 'node representation' (Z, M x D), which is then processed by a '3D Projector' block to produce '3D coordinates' (M x 3).

Fig. 3.18 The employed architecture. Features are recast in graph form to predict 3D backbone coordinates via a GrT + 3D projector architecture

The Graph Transformer

The GrT has been implemented as described in [199]. The goal of a GrT is to learn informative meta-paths within the graph, i.e., an ordered sequence of node types and edge types. The GrT also implements an attention mechanism, which is a function of the neighbourhood connectivity for each node in the graph [45]. The output \(Z \in \mathbb{R}^{M \times D}\) of the \(l\) -th layer of a GrT with \(C\) attention heads is a node representation with the same dimensionality as \(X\) , which can be written as

\[Z^{(l)} = \bigoplus_{i=1}^C \sigma(\tilde{\Delta}_i^{-1} \tilde{A}_i^{(l)} X W) \quad (3.18)\]

where:

  • \(\bigoplus\) is the concatenation operator,
  • \(\sigma(\cdot)\) is the sigmoid function,
  • \(\tilde{\Delta}_i\) is the degree matrix of \(\tilde{A}_i^{(l)}\) , defined as \(\Delta_m = \sum_n A_{mn}\) ,
  • \(X\) is the feature matrix,
  • \(W \in \mathbb{R}^{D \times D}\) is a trainable weight matrix, and
  • \(\tilde{A}_i^{(l)} = A_i^{(l)} + I\) , in which \(A_i^{(l)}\) is the adjacency matrix from the \(i\) -th channel of the metapath tensor \(\mathbf{A}^{(l)} \in \mathbb{R}^{M \times M \times C}\) .

\(\mathbf{A}^{(l)}\) is evaluated as \(\mathbf{A}^{(l)} = \Delta^{-1} \mathbf{Q}_1 \mathbf{Q}_2\) , where \(\mathbf{Q}_1\) and \(\mathbf{Q}_2\) , both \(\in \mathbb{R}^{M \times M \times C}\) , are two adjacency tensors selected according to:

\[\mathbf{Q} = \varphi[\mathbf{A}; \zeta(\mathbf{W}_\varphi)] \quad (3.19)\]

where \(\mathbf{A} \in \mathbb{R}^{M \times M \times K}\) is the adjacency tensor, \(\varphi(\cdot)\) is the convolution operator, \(\zeta(\cdot)\) is the softmax function and \(\mathbf{W}_\varphi \in \mathbb{R}^{C \times C \times K}\) are the weights of \(\varphi\) . \(Z\) contains the node representations from \(C\) different meta-path graphs.

The 3D Projector

The 3D projector is a simple fully connected layer obeying

\[P = Z^{(L)} W_P, \quad (3.20)\]

where \(Z^{(L)}\) is the output of the \(L\) -th layer of the GrT, \(W_P \in \mathbb{R}^{D \times 3}\) is the weight matrix of the projector and \(P \in \mathbb{R}^{M \times 3}\) are the 3D coordinates of the \(M C_\alpha\) atoms in the protein chain. To train the model, a distance map is evaluated for each protein from the predicted coordinates \(P\) as \(\tilde{D}_{ij} = d_{ij}\) , where \(d_{ij} = \|P_i - P_j\|_2\) is the Euclidean distance between the 3D coordinates of the \(i\) -th and \(j\) -th amino acid in \(P\) .

3.7.2 Training details

The total loss to minimize is equal to

\[\mathcal{L} = \mathcal{L}_1 + \mathcal{L}_2 \quad (3.21)\]

in which the first term minimizes the \(L_1\) norm between \(\mathbf{D}\) (the ground truth distance map) and \(\tilde{\mathbf{D}}\) , as

\[\mathcal{L}_1 = \frac{1}{N^2} \sum_i^N \sum_j^N \|\tilde{D}_{ij} - D_{ij}\|_1 \quad (3.22)\]

The second term maximizes the structural similarity index (SSIM) between \(\mathbf{D}\) and \(\tilde{\mathbf{D}}\) weighted by an arbitrary coefficient \(\alpha = 10\) to make \(\mathcal{L}_2\) of the same order of magnitude as \(\mathcal{L}_1\) :

\[\mathcal{L}_2 = \alpha \left( 1 - SSIM\{\mathbf{D}, \tilde{\mathbf{D}}\} \right) = \alpha \left( 1 - \frac{(2\mu_{\tilde{\mathbf{D}}}\mu_{\mathbf{D}} + c_1)(2\sigma_{\tilde{\mathbf{D}}\mathbf{D}} + c_2)}{(\mu_{\tilde{\mathbf{D}}}^2 + \mu_{\mathbf{D}}^2 + c_1)(\sigma_{\tilde{\mathbf{D}}}^2 + \sigma_{\mathbf{D}}^2 + c_2)} \right) \quad (3.23)\]

where \(\mu_{\mathbf{D}}\) is the mean of \(\mathbf{D}\) , \(\mu_{\tilde{\mathbf{D}}}\) the mean of \(\tilde{\mathbf{D}}\) , \(\sigma_{\tilde{\mathbf{D}}\mathbf{D}}\) the covariance of \(\tilde{\mathbf{D}}\) and \(\mathbf{D}\) , \(\sigma_{\tilde{\mathbf{D}}}^2\) the variance of \(\tilde{\mathbf{D}}\) , \(\sigma_{\mathbf{D}}^2\) the variance of \(\mathbf{D}\) , \(c_1 = (k_1 L)^2\) , \(c_2 = (k_2 L)^2\) with \(k_1 = 0.01\) , \(k_2 = 0.03\) and \(L\) the dynamic range is set to 255.

Note how the loss is measured over distance maps and not over 3D coordinates as 3D coordinates depend on a reference frame, while distances are rotationally and translationally invariant. It is possible to include an orientational term in the loss (e.g., MAE between original and predicted cost maps or angle maps), but this would require more coordinates to be predicted rather than just the \(C_\alpha\) coordinates, which are enough for distance maps.

The model consists of 108813 trainable parameters, of which 108648 are from the GrT and 165 are from the projector. The training and testing sets are subsets of PDNET. The first is composed of 200 proteins, while for testing the accuracy of the model we tested it on 5 test sets of 150 proteins each (labelled D1, D2, etc.). The optimizer has been set to Adam with exponentially decaying learning rate, with initial

learning rate \(\eta_0 = 1 \times 10^{-2}\) and decay rate per epoch \(\gamma = 0.9\) . The GrT has \(C = 4\) attention heads and 3 layers. The batch size has been fixed to \(B = 1\) and the network has been trained for \(E = 5\) epochs, for a total of 1000 training iterations.

Combinations of \(\eta \in \{1 \times 10^{-1}, 1 \times 10^{-2}, 1 \times 10^{-3}, 3 \times 10^{-4}\}\) , \(E \in \{3, 5, 10\}\) , \(B \in \{1, 50, 100\}\) , \(L \in \{3, 6, 10\}\) , \(C \in \{1, 4, 5\}\) , as well as \(\mathcal{L} = \mathcal{L}_1\) and additional layers in the 3D projector have also been implemented and tested, but the hyperparameters above were found to be optimal for our problem.

3.7.3 Alignment of predicted coordinates to ground truth

The predicted coordinates \(P\) are relative to a different reference frame compared to \(T\) , the ground truth coordinates from the PDB database. \(P\) and \(T\) must be aligned before evaluating the GDT scores. The goal of alignment is to find \(\mathbf{R} \in \mathbb{R}^{3 \times 3}\) , \(t \in \mathbb{R}^3\) such that \(P = \mathbf{R}T + t\) . In the GA case, we aim at finding the rotor \(R\) such that \(P = RT\tilde{R}\) . We do so by initially centering \(P, T\) at the origin (i.e by placing their centre of mass at the origin) and align the two point clouds via characteristic multivector technique [100].

Results in Table 3.4 are obtained after aligning \(P, T\) . We picked the best GDT scores out of two different alignments procedures, namely (i) singular value decomposition (SVD) and (ii) GA-based alignment via characteristic multivectors. Results in Tables 3.2-3.3, as they refer to 750 proteins, have been obtained only through (i) as it is faster computationally. The two algorithms are summarized in Algorithms 2-3.

3.7.4 Results

For each protein of length \(M\) we obtained an \(M \times 3\) point cloud \(P\) of predicted 3D coordinates. We then aligned the predicted coordinates to the ground truth coordinates \(T\) (obtained from the Protein Data Bank (PDB) [24]) via singular value decomposition (SVD) (see Appendix A) and performed the GDT, and evaluated the GDT_TS (total score) and GDT_HA (half size) between \(P\) and \(T\) as follows:

\[\text{GDT\_TS} = \frac{p_{<1\text{\AA}} + p_{<2\text{\AA}} + p_{<4\text{\AA}} + p_{<8\text{\AA}}}{4} \quad (3.24)\]

\[\text{GDT\_HA} = \frac{p_{<0.5\text{\AA}} + p_{<1\text{\AA}} + p_{<2\text{\AA}} + p_{<4\text{\AA}}}{4} \quad (3.25)\]

where \(p_{<n\text{\AA}}\) indicates the percentage of an amino acid's coordinates in \(P\) whose distance from the corresponding amino acid's coordinates in \(T\) is below \(n \text{ \AA}\) . Results for 5 test sets D1 to D5 are summarized in Tables 3.2-3.3, consistently with prior art [3, 36]. Note

Algorithm 2 SVD Alignment



i ← 0
M ← 103
while i ≤ M do

     $C_P \leftarrow \frac{1}{N} \sum_i^N P^i$                                      ▷ compute centroids
     $C_T \leftarrow \frac{1}{N} \sum_i^N T^i$ 

     $H \leftarrow (P - C_P)(T - C_T)^T$                                 ▷  $H \in \mathbb{R}^{3 \times 3}$ 
     $U, \Sigma, V^T \leftarrow \text{SVD}(H)$ 
                                                    ▷ perform SVD
     $\mathbf{R} \leftarrow VU^T$ 
     $t \leftarrow C_P - \mathbf{R}C_T$ 

     $T \leftarrow \mathbf{R}T + t$                                             ▷ rotate and translate T
    i ← i + 1
end while


that these results are intended to provide a relative comparison of different orientational features on a common, simple dataset like PDNET. They are not meant to compete with current state-of-the-art methods such as AlphaFold, which typically achieve GDT scores \(\geq 80\) [81].

The highest GDT score has been highlighted in bold and the second highest has been underlined.

Table 3.2 GDT_TS scores over the five datasets.

Case D1 D2 D3 D4 D5
max med. min max med. min max med. min max med. min max med. min
(a) no orientation 24.0 7.48 0.730 22.5 7.36 0.87 21.3 7.66 0.81 21.3 7.66 0.81 26.0 7.60 1.13
(b) \(\mathbf{M}_{C_\alpha}\) 30.0 11.6 1.71 28.7 11.4 1.87 36.3 12.5 2.25 32.2 12.1 1.55 32.8 12.5 1.58
(c) \(\mathbf{M}_{C_\alpha}, \mathbf{M}_{C_\beta}\) 38.5 11.7 1.70 35.2 11.7 1.77 38.7 12.2 1.89 34.5 11.5 2.33 38.6 12.9 1.42
(d) \(\mathbf{N}_{C_\alpha}\) 35.8 11.2 2.29 31.7 11.9 1.01 37.1 12.0 2.72 33.0 11.0 2.47 37.1 11.8 2.91
(e) \(\mathbf{M}_{C_\alpha}, \mathbf{N}_{C_\alpha}\) 39.6 11.5 2.58 36.1 12.3 2.00 29.7 12.1 2.21 29.0 11.9 1.09 38.3 12.1 0.13
(f) \(\mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}\) 32.2 11.6 2.35 34.1 11.8 1.20 32.6 12.3 1.03 33.6 11.9 2.12 31.3 12.7 3.05
(g) \(\mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}, \mathbf{N}_N\) 29.7 12.0 1.83 39.4 12.4 1.41 41.4 12.7 1.49 39.0 11.7 1.62 37.8 13.0 2.03
(h) \(\mathbf{M}_{C_\alpha}, \mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\beta}\) 33.9 11.7 2.71 31.9 12.0 0.89 29.8 12.9 1.79 30.7 12.4 2.03 43.1 11.2 0.66
(i) \(\Omega, \Phi, \Psi\) 33.2 12.3 2.29 33.0 13.1 1.42 32.3 12.7 1.58 29.6 12.1 1.82 32.8 13.8 1.82

There are several things to note here: firstly, by adding orientational information, an improvement on the protein coordinates quality corresponding to at least \(\sim 4\%\) compared to the approach without orientational information can be measured, as found in [197].

Secondly, for all the analyzed cases (b)-(i) the relative improvement is generally \(< 2\%\) , which implies that no clear superior approach to modeling the amino acid

Algorithm 3 GA Alignment



i ← 0
M ← 103
while i ≤ M do
  CP ←  $\frac{1}{N} \sum_i^N P^i$  ▷ compute centroids
  CT ←  $\frac{1}{N} \sum_i^N T^i$ 

  F ← (P − CP)(T − CT)T
  G ← (T − CT)(T − CT)T ▷ F, G ∈ ℝ3×3

  f1, f2, f3 ← F:,1, F:,2, F:,3 ▷ extract columns of F, G
  g1, g2, g3 ← G:,1, G:,2, G:,3

  f1 ← (f2 ∧ f3)/(f1 ∧ f2 ∧ f3) ▷ reciprocal frames
  f2 ← (f1 ∧ f3)/(f1 ∧ f2 ∧ f3)
  f3 ← (f1 ∧ f2)/(f1 ∧ f2 ∧ f3)

  X ← 1 + [(f1g1 + f2g2 + f3g3)] + [(f2 ∧ f1)(g1 ∧ g2) + (f3 ∧ f2)(g2 ∧ g3) + (f3 ∧ f1)(g1 ∧ g3)] + (f3 ∧ f2 ∧ f1)(g1 ∧ g2 ∧ g3)
  α ← X $\bar{X}$ 
   $\bar{R}$  ← X/√α

  T ← R $\bar{R}$ 
  i ← i + 1
end while


Table 3.3 GDT_HA scores over the five datasets.

Case D1 D2 D3 D4 D5
max med. min max med. min max med. min max med. min max med. min
(a) no orientation 9.12 1.61 0.000 7.41 1.81 0.000 8.02 1.74 0.100 8.02 1.74 0.100 8.33 1.80 0.000
(b) \(\mathbf{M}_{C_\alpha}\) 10.3 2.55 0.080 10.1 2.42 0.350 15.1 2.68 0.310 11.6 2.05 0.130 13.1 2.78 0.000
(d) \(\mathbf{N}_{C_\alpha}\) 15.3 2.68 0.220 11.1 3.00 0.140 16.1 2.82 0.130 12.1 2.60 0.510 16.0 2.88 0.590
(e) \(\mathbf{M}_{C_\alpha}, \mathbf{N}_{C_\alpha}\) 15.8 2.96 0.200 12.5 2.97 0.210 11.8 2.72 0.280 9.20 2.85 0.000 15.9 2.12 0.000
(f) \(\mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\alpha}\) 11.4 2.20 0.180 10.2 2.07 0.100 10.2 2.22 0.340 11.2 2.27 0.000 12.6 2.62 3.05
(g) \(\mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\alpha}, \mathbf{N}_N\) 10.4 2.56 0.280 17.0 2.57 0.200 17.7 3.03 0.110 16.4 2.69 0.170 15.5 2.94 0.340
(h) \(\mathbf{M}_{C_\alpha}, \mathbf{N}_{C_\alpha}, \mathbf{N}_{C_\alpha}\) 12.7 2.59 0.330 11.4 3.06 0.200 10.1 2.85 0.280 9.41 2.64 0.130 21.1 2.71 0.000
(i) \(\Omega, \Phi, \Psi\) 13.8 2.93 0.230 11.6 2.51 0.000 12.4 3.20 0.280 10.0 2.83 0.300 13.4 2.91 0.300

orientation exists. This means that, as long as orientational information is added, an improvement is going to be seen regardless of how this information is encoded.

However, the orientational information encoded through our GA-based metrics generally require much less information about the protein backbone than encoding orientation as angle maps. Compare, for example, case (d) with (i) in Table 3.3: in case (d) we have a single dot product map \(\mathbf{N}_{C_\alpha}\) ( \(K = 5\) ), constructed over oriented

Table 3.4 GDT_TS score for 10 example proteins (GDT_HA score in parenthesis)

Protein ID (a) (b) (c) (d) (e) (f) (g) (h) (i)
1mk0A 15.5 (6.44) 23.5 (5.15) 14.4 (2.58) 14.9 (4.90) 16.5 (3.35) 22.7 ( 8.25 ) 30.9 ( 7.99 ) 21.1 ( 7.99 ) 18.8 (6.70)
1lyhA 9.37 (3.12) 16.3 (3.85) 16.1 (4.08) 12.3 (1.68) 19.5 ( 6.97 ) 24.0 (6.73) 21.9 (4.57) 21.2 (6.94) 20.9 ( 8.65 )
1zv1A 22.9 (5.93) 21.6 (6.35) 28.8 ( 8.90 ) 22.9 ( 9.74 ) 20.3 (8.89) 20.3 (3.81) 25.8 (7.63) 28.8 (5.93) 19.5 (6.36)
2ddoB 13.0 (4.17) 11.8 (2.31) 13.7 (3.24) 17.1 (5.55) 15.7 (5.55) 19.9 (5.78) 25.7 ( 8.10 ) 25.5 ( 7.87 ) 20.4 (7.41)
2dgbA 9.04 (2.71) 13.9 (2.71) 16.9 (3.01) 18.4 ( 6.32 ) 23.5 (6.93) 20.2 (6.02) 17.5 (5.42) 13.3 (4.22) 19.9 (6.48)
2dim9A 12.3 (2.97) 15.7 (4.24) 18.6 (4.45) 16.7 ( 4.66 ) 18.9 ( 7.20 ) 13.1 (3.60) 14.2 (2.33) 15.0 (4.24) 10.6 (2.12)
2chiW 8.69 (1.09) 10.9 (2.83) 10.7 (2.83) 12.4 (3.26) 11.7 (3.48) 10.2 (2.82) 11.1 (2.61) 15.0 ( 3.26 ) 22.2 ( 7.17 )
2fyuK 10.4 (1.89) 15.1 (5.66) 17.0 (5.66) 29.7 ( 9.43 ) 18.9 (5.19) 16.5 (2.83) 26.9 ( 6.13 ) 14.6 (4.72) 24.1 (7.55)
2fztA 14.7 (2.24) 26.0 ( 7.37 ) 10.6 (2.24) 27.9 ( 8.33 ) 19.2 (6.09) 15.1 (4.49) 25.0 (6.73) 17.6 (4.17) 21.5 (3.52)
2gomA 19.7 (7.79) 26.6 (9.43) 27.5 (6.56) 25.0 ( 9.84 ) 28.3 (9.43) 25.0 (6.56) 27.9 (8.20) 27.5 (9.02) 25.0 (11.1)

points at position \(C_\alpha\) in the plane specified by the \(N - C_\alpha - C\) triplet, that yields comparable results to case (i), in which angle maps are three additional features ( \(K = 7\) ) constructed over a total of 5 different planes.

This indicates two things: GA-based metrics can (1) distill more information about the relative orientation of amino acids from fewer geometrical objects, and (2) that this information can be condensed in fewer features, which are equivalent to the 3 angle maps approach.

Examples of GDT scores for 10 proteins are given in Table 3.4. Again, it can be seen that no approach is consistently superior for randomly selected proteins. Examples of predicted coordinates for selected cases are given in Figures 3.19-3.20-3.21.

Geometrical meaning of features

The fact that our GA approach yields comparable results to angle maps by employing fewer features obtained from less geometrical information can be explained with how the information is packed in GA based features: it is possible to establish a relationship between the secondary structure and the patterns in the cost maps. By secondary structure we refer to local folding patterns of a protein, most commonly \(\alpha\) -helices and \(\beta\) -sheets.

We illustrate this relationship by assigning an arbitrary colour to each secondary structure: red to \(\alpha\) -helices, green to \(\beta\) -sheets, blue to turns and white to all the others. In Fig. 3.22 we see how there is almost one-to-one correspondence between colour patches of secondary structures and patterns in cost maps. This shows how cost maps, despite being built starting from a single plane in the backbone ( \(N - C_\alpha - C\) ), contain information about the protein folding that is among the most relevant in PSP pipelines, namely secondary structures.

To the best of our knowledge, this is the first example of a single orientational map that clearly matches the secondary structures.

1ef2b5e267d1196dd7375ea783d5ff59_img.jpg

Figure 3.19 displays the results for protein 2gomA. The original 3D protein model is shown in green, with the original distance map \(D\) below it. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(\tilde{D}\) built from \(P\) and the mean absolute error (MAE) and structural similarity index measure (SSIM) measured with respect to \(D\) .

Case (f):

  • GDT_TS = 25.00
  • GDT_HA = 9.84
  • MAE: 5.83
  • SSIM: 0.689

Case (h):

  • GDT_TS = 27.46
  • GDT_HA = 9.02
  • MAE: 3.07
  • SSIM: 0.768

Case (l):

  • GDT_TS = 25.00
  • GDT_HA = 11.07
  • MAE: 3.56
  • SSIM: 0.695

Figure 3.19: Results for protein 2gomA. The figure displays the original 3D protein model (green ribbon) and its distance map (D) on the left. To the right, three cases are shown: 'case (f)' (red/blue lines), 'case (h)' (red/blue lines), and 'case (l)' (red/blue lines). Each case includes its GDT_TS and GDT_HA scores, a distance map (D-tilde), and its MAE and SSIM scores.

Fig. 3.19 Results for protein 2gomA. The original 3D protein model is shown in green with the original distance map \(D\) below. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(\tilde{D}\) built from \(P\) and the mean absolute error (MAE) and structural similarity index measure (SSIM) measured with respect to \(D\) .

cbea5bcd5880c40eb83c784096a83c2d_img.jpg

Figure 3.20 displays results for protein 1yqhA. The top row shows the original 3D protein model (red) and the predicted coordinates (blue) for selected cases (f, h, and i). The bottom row shows the distance map \(D\) built from \(P\) and the MAE and SSIM values measured with respect to \(D\) .

Legend for coordinate plots:
— original (blue)
— predicted - case (f) (red)
— predicted - case (h) (red)
— predicted - case (i) (red)

Case (f) scores:
GDT_TS = 23.80
GDT_HA = 6.49
MAE: 3.64
SSIM: 0.746

Case (h) scores:
GDT_TS = 21.15
GDT_HA = 6.49
MAE: 3.93
SSIM: 0.685

Case (i) scores:
GDT_TS = 16.83
GDT_HA = 4.09
MAE: 4.72
SSIM: 0.663

Figure 3.20: Results for protein 1yqhA. The figure displays four panels. The first panel shows the original 3D protein model in red. The subsequent three panels show the original (blue) and predicted (red) coordinates for selected cases (f, h, and i), along with their corresponding GDT scores. Below each coordinate plot is a distance map D (heatmap) and the MAE and SSIM values.

Fig. 3.20 Results for protein 1yqhA. The original 3D protein model is shown in red with the original distance map \(D\) below. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(D\) built from \(P\) and the MAE and SSIM measured with respect to \(D\) .

49449978dcae2b2d6a2cc20c67b3d536_img.jpg

Figure 3.21 displays results for protein 1z0jB. The top row shows the original 3D protein model (yellow) and the original distance map \(\mathbf{D}\) (bottom left). The middle row shows the ground truth ( \(T\) , blue) and predicted ( \(P\) , red) coordinates for three cases (g, h, i), along with their corresponding GDT scores.

Case GDT_TS GDT_HA
(g) 34.31 13.72
(h) 40.20 18.14
(i) 26.96 9.80

The bottom row shows the distance map \(\hat{\mathbf{D}}\) built from \(P\) and the MAE and SSIM measured with respect to \(\mathbf{D}\) .

Case MAE SSIM
(g) 3.05 0.780
(h) 2.38 0.827
(i) 3.75 0.745

Figure 3.21: Results for protein 1z0jB. The figure shows a 3D model of the protein in yellow, its original distance map D, and three cases (g, h, i) showing original (blue) and predicted (red) coordinates. Below each case is a heatmap of the predicted distance map D-hat, with MAE and SSIM values.

Fig. 3.21 Results for protein 1z0jB. The original 3D protein model is shown in yellow with the original distance map \(\mathbf{D}\) below. The ground truth and predicted coordinates \(T, P\) are given on the top row for selected cases in red and blue, respectively, with their corresponding GDT scores. Below, the distance map \(\hat{\mathbf{D}}\) built from \(P\) and the MAE and SSIM measured with respect to \(\mathbf{D}\) .

f88c98ece93f791a5eb0a524ccc85b54_img.jpg

Figure 3.22 displays orientational maps and secondary structures. The top row shows the orientational maps \(\Phi\) , \(\Psi\) , \(\Omega\) , \(N\) , and \(M\) . The bottom row shows the Secondary Structures map, where helices (red) and sheets (green) can be easily spotted from the patterns in the cost map \(\mathbf{M}\) .

Figure 3.22: Orientational maps and secondary structures. The figure shows five cost maps (Phi, Psi, Omega, N, M) and a Secondary Structures map. The cost maps show patterns of helices (red) and sheets (green).

Fig. 3.22 Orientational maps and secondary structures. Helices (red) and sheets (green) can be easily spotted from the patterns in the cost map \(\mathbf{M}\) .

3.8 PSP with equivariant layers in 3D GA

A major shortcoming of many PSP approaches is that the geometric information about the proteins, which is crucial, always needs to be flattened (i.e. scalarized) in order to be fed into and interpreted by DL architectures. Geometric deep learning can be employed to overcome this issue and preserve the geometric nature of the data. Several examples are given in [57, 51, 107, 36, 145], in which proteins are represented as graphs, where nodes correspond to amino acids, and edges represent interactions between them (e.g., bonds or contacts). In this Section we show how to implement a geometric machine learning pipeline built upon Clifford Group Equivariant Neural Network (CGENN) layers, which explicitly work with objects in Clifford Algebra and can perform equivariant transformation on them.

3.8.1 GA-equivariant 3D projector

We employ CGENN layers as firstly introduced in [158]. CGENNs (which we indicate with \(\phi\) ) are networks built upon equivariant layers and operate on multivectors of a Clifford Algebra in any dimension in an \(E(n)\) -equivariant way, i.e. equivariant over the \(n\) -dimensional Euclidean space. This means that when an orthogonal transformation \(\rho(w)\) is applied to the input data, \(x\) , the model's representations corotate, i.e.

\[\phi(\rho(w)(x)) = \rho(w)(\phi(x)). \quad (3.26)\]

Operating on the transformed data is the same as operating on the data and then transforming: for physical transformations this would be termed as covariance. The equivariance of CGENNs is particularly desirable in PSP problems, since ground truth protein coordinates sit in an arbitrary reference frame which differs for each protein chain. Moreover, CGENNs directly transform data in a vector basis, offering a better geometric interpretation of the network's intermediate outputs in terms of folding.

We wish to use CGENNs in the projector step of the architecture employed in Section 3.8 (see Fig.3.23). The 3D projector is responsible for transforming, or projecting the new nodes in output of the GrT onto 3D Euclidean space.

We employ two types of CGENN layers, namely (i) multivector linear (MVL) layers and (ii) fully connected geometric product (FCGP) layers, within the 3D projector, and compare them to fully connected linear (L) layers.

235e996e83e7d5198cf9d909f5713cdd_img.jpg

The diagram illustrates the architecture of the CGENN layers within the 3D Projector. At the top, feature channels are processed into node embeddings \(X\) ( \(M \times D\) ) and edge embeddings \(A\) ( \(M \times M \times K\) ). These are input to a Graph Transformer. The output is a node representation \(Z\) ( \(M \times D\) ), which is then processed by a 3D Projector to yield 3D coordinates ( \(M \times 3\) ). The bottom part of the diagram shows the internal structure of the CGENN layers. Node embeddings ( \(M \times D\) ) are reshaped to \(M \times 3 \times D/3\) and processed by a \(GA(3,0)\) embedding block, resulting in \(M \times 8 \times D/3\) representations. These are then processed by a series of layers, with the final output being \(M \times 8 \times 3\) . A neuron is defined as having 1 scalar, 3 vector, 3 bivector, and 1 trivector components. This is followed by a grade 1 extraction block ( \(M \times 8 \times 1\) ) to produce the final predicted coordinates ( \(M \times 3\) ), which are compared against the ground truth coordinates.

Diagram of the CGENN layers in the 3D Projector. The top part shows the flow from feature channels to node and edge embeddings (X and A), then to a Graph Transformer, then to a 3D Projector yielding 3D coordinates. The bottom part details the CGENN layers: node embeddings are reshaped and processed by GA(3,0) embedding blocks. Each neuron consists of 1 scalar, 3 vector, 3 bivector, and 1 trivector components. The output is processed by a grade 1 extraction block to produce predicted coordinates (M x 3), which are compared with ground truth coordinates.

Fig. 3.23 CGENN layers employed within the 3D Projector in a Protein Structure Prediction pipeline. A protein with \(M\) amino acids can be represented as a graph with \(D\) nodes and \(K\) edges. The Graph Transformer extracts a new node representation \(M \times D\) , while the 3D Projector maps the representation onto 3D space to obtain \(M \times 3\) 3D coordinates. In CGENNs, neurons, weights and biases are multivectors, in our case in \(\mathcal{G}_{3,0,0}\) , whose vector part can be interpreted geometrically.

Given a set of multivectors \(\{x_i\}_{i=1}^C\) , with \(C\) input channels, the output \(z_j\) of the \(j\) -th channel of a MVL layer is given by:

\[\langle z_j \rangle_k = \sum_{i=1}^C \phi_{ijk} \langle x_i \rangle_k, \quad (3.27)\]

in which \(\langle \cdot \rangle_k\) is the extraction of the grade \(k\) elements in multivectors \(x\) , \(z\) and \(\phi_{ijk} \in \mathbb{R}\) is a learnable weight.

The FCGP layer, on the other hand, models interaction terms between pairs of multivectors. Given a learnable linear combination of the inputs

\[y_i = \sum_{p=1}^C \beta_{pi} x_p, \quad (3.28)\]

the output of the \(j\) -th channel, \(z_j\) , obeys

\[\langle z_j \rangle_k = \sum_{i=1}^C \sum_{p=1}^C \phi_{ijk} \langle x_i (\beta_{pi} x_p) \rangle_k. \quad (3.29)\]

Both \(\phi_{ijk} \in \mathbb{R}\) and \(\beta_{pi} \in \mathbb{R}\) are learnt, explaining the higher number of parameters for the FCGP layers compared to the MVL layers in Tables 3.5-3.6.

The 3D projector models a function \(g\) such that

\[P = g(Z^{(L)}), \quad (3.30)\]

where \(Z^{(L)} \in \mathbb{R}^{M \times D}\) is the output of the \(L\) -th layer of the GrT, with \(D\) being the number of nodes, \(P \in \mathbb{R}^{M \times 3}\) are the 3D coordinates of the \(M\) \(C_\alpha\) atoms in the protein chain and \(g\) depends on the type of layer chosen.

When the 3D projector is a fully connected layer, as in [36, 145] the function \(g(\cdot)\) is parametrized by a weight matrix \(W_P \in \mathbb{R}^{D \times 3}\) . When CGENNs are employed, on the other hand, some extra steps have to be considered:

  • Reshape \(Z^{(L)} \in \mathbb{R}^{M \times D}\) into \(Z^{(L)} \in \mathbb{R}^{M \times (D/3) \times 3}\) , so that we can geometrically interpret the output of the GrT as \(D/3\) proposals of 3D Euclidean coordinates,

  • Embed the reshaped GrT output into the \(\mathcal{G}_{3,0,0}\) algebra (i.e. assign 3D coordinates to a basis vector \(\{e_1, e_2, e_3\}\) ) so to obtain an input tensor \(X_{in} \in \mathbb{R}^{M \times (D/3) \times 8}\) representing a multivector \(x \in \mathcal{G}_{3,0,0}\) with 8 real coefficients (1 scalar, 3 vectors, 3 bivectors, 1 trivector) and only the vector part non-zero,

  • Downsample the multivector proposals with one or more CGENN layers, operating according to Eq. 3.27 for the MVL layer or Eq. 3.29 for the FCGP layer, until obtaining an output tensor \(X_{out} \in \mathbb{R}^{M \times 1 \times 8}\) ,

  • Extract grade 1 elements from the obtained multivectors, corresponding to the vector part, i.e. the protein coordinates in 3D Euclidean space \(P \in \mathbb{R}^{M \times 1 \times 3}\) .

3.8.2 Triple-wise fully connected geometric product layer

On top of the two existing CGENN layers, we add a third one, which implements a type of fully connected geometric product layer \(\psi\) to model interactions between triples. Its output follows:

\[\langle \mathbf{z}_j \rangle_k = \sum_{i=1}^C \psi_{ijk} \langle \mathbf{x}_i (\mathbf{y}_i + \bar{\mathbf{y}}_i) \rangle_k = \sum_{i=1}^C \sum_{p=1}^C \sum_{q=1}^C \psi_{ijk} \langle \mathbf{x}_i (\alpha_{pi} \mathbf{x}_p + \beta_{qi} \mathbf{x}_q \alpha_{pi} \mathbf{x}_p) \rangle_k \quad (3.31)\]

in which

\[\mathbf{y}_i = \sum_{p=1}^C \alpha_{pi} \mathbf{x}_p, \quad (3.32)\]

and

\[\bar{\mathbf{y}}_i = \sum_{q=1}^C \beta_{qi} \mathbf{x}_q \mathbf{y}_q, \quad (3.33)\]

In which \(\psi_{ijk}, \alpha_{pi}, \beta_{qi} \in \mathbb{R}\) are trainable parameters. We refer to these two layers as P-FCGP layer (P for pairwise) and T-FCGP layer (T for triples). We verified that the T-FCGP layer is indeed equivariant (see Fig. 3.24).

We tested a total of 8 approaches: (a) 1 linear layer (27 nodes to 3); (b) 1 MVL layer (9 3D structures to 1); (c) 1 P-FCGP layer (9 3D structures to 1); (d) 1 T-FCGP layer (9 3D structures to 1); (e) 2 linear layers (27 nodes to 9 to 3); (f) 2 MVL layers (9 3D structures to 3 to 1); (g) 2 P-FCGP layers (9 3D structures to 3 to 1) and (h) 2 T-FCGP layers (9 3D structures to 3 to 1).

3.8.3 Results

Results are summarized in Tables 3.5-3.6. Compared to linear layers, CGENNs increase the median GDT_TS by up to 3% and 2.1% with one and two layers, respectively. Similarly, the median GDT_HA shows increases of 1.1% and of 1.6% with one and two layers, respectively. Note how for the MVL layers, an improvement is achieved despite

ee9b6704de74d58f8d805dfc0e22a2d3_img.jpg

Figure 3.24 displays four bar charts illustrating the effect of a random orthogonal transformation \(p(w)\) and a T-FCGP layer \(\psi\) on a random multivector \(\mathbf{x}\) . The charts show the values of the multivector components for basis elements \(1, e_1, e_2, e_3, e_{12}, e_{13}, e_{23}, e_{123}\) .

  • Top Left:** \(p(w)(\mathbf{x})\) . The values are approximately: \(1: 0.6, e_1: 0.8, e_2: 0.2, e_3: -0.4, e_{12}: 0.0, e_{13}: -0.3, e_{23}: -1.5, e_{123}: 0.1\) .
  • Top Right:** \(\psi(\mathbf{x})\) . The values are approximately: \(1: -0.6, e_1: 0.1, e_2: 0.5, e_3: 0.3, e_{12}: 0.3, e_{13}: 0.6, e_{23}: -0.3, e_{123}: -1.0\) .
  • Bottom Left:** \(\psi(p(w)(\mathbf{x}))\) . The values are approximately: \(1: -0.6, e_1: 0.1, e_2: 0.5, e_3: 0.3, e_{12}: 0.3, e_{13}: 0.6, e_{23}: -0.3, e_{123}: -1.0\) .
  • Bottom Right:** \(p(w)(\psi(\mathbf{x}))\) . The values are approximately: \(1: -0.6, e_1: 0.1, e_2: 0.5, e_3: 0.3, e_{12}: 0.3, e_{13}: 0.6, e_{23}: -0.3, e_{123}: -1.0\) .

The bottom two charts show that \(\psi(p(w)(\mathbf{x})) = p(w)(\psi(\mathbf{x}))\) , demonstrating the equivariance of the T-FCGP layer.

Figure 3.24: Four bar charts showing the effect of a random orthogonal transformation p(w) and a T-FCGP layer psi on a random multivector x. The charts are: 1) p(w)(x), 2) psi(x), 3) psi(p(w)(x)), and 4) p(w)(psi(x)). Each chart shows values for basis elements 1, e1, e2, e3, e12, e13, e23, and e123. The values in the third and fourth charts are identical, demonstrating equivariance.

Fig. 3.24 Given a random multivector \(\mathbf{x}\) , random orthogonal transformation \(p(w)\) and our T-FCGP layer \(\psi\) , \(\psi(p(w)(\mathbf{x}))\) is equivalent to \(p(w)(\psi(\mathbf{x}))\) , hence \(\psi\) is equivariant.

dd87a668e75920eb7a273a05f302aac6_img.jpg

Figure 3.25 displays two line graphs showing the Train Loss (Y-axis, ranging from 5 to 25) versus Epochs (X-axis, ranging from 0 to 25) for the GrT + 3D projector architecture. The graphs compare the performance of different CGENN layers: P-FCGP (red line with circles), T-FCGP (black line with squares), MVL (green line with triangles), and L (blue line with diamonds).

  • Left Graph (Train Loss (single layer)):** T-FCGP and MVL show the lowest loss, stabilizing around 5-6. P-FCGP and L show higher loss, stabilizing around 7-8.
  • Right Graph (Train Loss (two layers)):** T-FCGP and MVL show the lowest loss, stabilizing around 5-6. P-FCGP and L show higher loss, stabilizing around 7-8.

Figure 3.25: Two line graphs showing train loss over 25 epochs for the GrT + 3D projector architecture. The left graph is for a single layer and the right graph is for two layers. Both graphs compare P-FCGP (red line with circles), T-FCGP (black line with squares), MVL (green line with triangles), and L (blue line with diamonds). In both cases, T-FCGP and MVL show lower loss than P-FCGP and L.

Fig. 3.25 Train loss for the GrT + 3D projector architecture for different CGENN layers in the projector.

be9c18a7b64777fce99b4309bdcea4ea_img.jpg

Figure 3.26: Validation loss for the GrT + 3D projector architecture for different CGENN layers in the projector. The figure contains two line plots. The left plot is titled 'Val Loss (single layer)' and the right plot is titled 'Val Loss (two layers)'. Both plots show 'Loss' on the y-axis (ranging from 5.0 to 20.0) against 'Epochs' on the x-axis (ranging from 0 to 30). Four methods are compared: P-FCGP (red dashed line with circles), T-FCGP (black dashed line with diamonds), MVL (green dashed line with squares), and L (blue dashed line with triangles). In both plots, all methods show a rapid decrease in loss over the first 10 epochs, followed by a plateau. The 'two layers' plot shows lower overall loss values compared to the 'single layer' plot.

Fig. 3.26 Validation loss for the GrT + 3D projector architecture for different CGENN layers in the projector.

a reduction in the number of trainable parameters, which is quite significant when the 3D projector is composed of 2 layers. This trend is mirrored in the loss profiles: train and validation losses for the 6 different approaches to the 3D projector are presented in Figures 3.25-3.26, respectively. In both figures, CGENN layer losses reaches lower minima than linear layers, despite a slower convergence. The distributions over the test set of MAE and SSIM measured between \(\mathbf{D}\) , \(\hat{\mathbf{D}}\) are shown in Figures 3.27a-3.27b, respectively. The lowest MAE is always obtained via CGENN layers. The highest median MAE, on the other hand, is achieved by approach (a), i.e. the single linear layer of [36, 145]. Similarly, the highest median SSIM over the test set is obtained via approaches (d) and (h), while the lowest via approach (a).

It looks like interactions between triples allow for better generalization on the test set despite similar or higher validation loss compared to other approaches. This yields higher median GDT scores, both TS and HA, and for both single and double layer projectors. This is also mirrored in the metrics employed in the loss, namely MAE and SSIM. This could mean that for more complex problems, interactions between triples might be a more suitable interaction to model with a neural network.

GDT scores for selected proteins are listed in Table 3.4. Note how the highest GDT scores are always obtained through CGENN layers.

We believe that the main advantage of our approach lies in its geometric interpretability. In Figure 3.28 an example of input and outputs of the CGENN layers is given for protein 2ehwA. The \(D/3\) coordinate proposals for case (f), in our case 9, are fed as input to the MVL layer (top). Note how the 3D coordinates, despite being still far from the correct folding, can be used to construct distance maps which are already close to ground truth, and hence fully interpretable as protein chains. The 3 outputs from the 1st MVL layer are shown in the middle section of Figure 3.28: the MAE decreases, while SSIM and GDT scores increase. The 3D structures “unfold”. Finally,

the output of the last layer is shown in the bottom section of Figure 3.28. It is not possible to have the same interpretability with linear layers.

12b521317c2c8ea33a1382d83e2b76be_img.jpg

Figure 3.27 consists of two bar charts comparing 'single layer' (blue) and 'double layer' (orange) models across four categories: T-FCGP, P-FCGP, MVL, and L. Both charts include error bars.

(a) Median MAE (Test set): The y-axis represents MAE from 3.0 to 4.5. For T-FCGP, single layer is ~3.3 and double layer is ~3.2. For P-FCGP, single layer is ~4.1 and double layer is ~3.0. For MVL, single layer is ~4.5 and double layer is ~3.3. For L, single layer is ~4.6 and double layer is ~4.0.

(b) Median SSIM (Test set): The y-axis represents SSIM from 0.65 to 1.00. For T-FCGP, single layer is ~0.95 and double layer is ~0.97. For P-FCGP, single layer is ~0.62 and double layer is ~0.78. For MVL, single layer is ~0.62 and double layer is ~0.78. For L, single layer is ~0.62 and double layer is ~0.74.

Figure 3.27: Median metrics with error metrics over the test set. (a) Median MAE (Test set) bar chart showing single layer (blue) and double layer (orange) performance for T-FCGP, P-FCGP, MVL, and L. (b) Median SSIM (Test set) bar chart showing single layer (blue) and double layer (orange) performance for the same models.

Fig. 3.27 Median metrics with error metrics over the test set measured between ground truth distance maps and distance maps built on predicted coordinates.

Table 3.5 GDT_TS scores over the PSICOV150 dataset for different 3D projection strategies.

3D projector type maximum median minimum number of parameters
(a) L layer 28.69 15.15 9.07 150
(b) MVL layer 34.29 16.98 6.59 142
(c) P-FCGP layer 33.20 18.13 5.35 682
(d) T-FCGP layer 52.50 18.77 5.56 1026
(e) 2 L layers 36.06 18.76 6.61 421
(f) 2 MVL layers 48.01 20.81 7.68 232
(g) 2 P-FCGP layers 58.04 20.56 5.45 1240
(h) 2 T-FCGP layers 48.57 21.96 9.14 1572

3.9 Conclusions

This chapter dealt with GA in PSP problems.

The first part focused on the issue of orientational features in PSP pipelines. We employed GA as a tool to model the protein backbone as a rigid body and built two novel features from it, which we named cost maps and oriented point maps, and compared their impact on PSP accuracy side by side with traditional angle maps. We verified that adding GA-based orientational features improves the accuracy in terms of GDT scores in a similar way to angle maps, but it requires less geometrical information about the protein backbone, generally from 1 up to 3 planes compared to 5 planes required by angle maps. This shows that GA condenses orientational information of

Table 3.6 GDT_HA scores over the PSICOV150 dataset for different 3D projection strategies.

3D projector type maximum median minimum number of parameters
(a) L layer 11.06 4.27 0.74 150
(b) MVL layer 13.52 4.49 0.58 142
(c) P-FCGP layer 14.14 5.38 0.78 682
(d) T-FCGP layer 27.75 6.29 0.87 1026
(e) 2 L layers 15.98 5.17 0.74 421
(f) 2 MVL layers 23.01 6.77 1.42 232
(g) 2 P-FCGP layers 33.62 6.79 0.61 1240
(h) 2 T-FCGP layers 25.38 7.97 1.37 1572

a449d914717a90c16526f166080595c6_img.jpg

Figure 3.28: Input and outputs of the CGENN layers for the prediction of the 3D structure of protein 2ehwA for approach (e). The figure shows a sequence of 10 rows of visualizations. Each row contains a 3D ribbon model of the protein structure (input in grey, output in red) and a corresponding 2D distance map (ground truth in grey, prediction in red). The metrics for each row are: Row 1: MAE: 17.6, SSIM: 0.12, GDT_TS: 5.0; Row 2: MAE: 19.9, SSIM: 0.00, GDT_TS: 5.0; Row 3: MAE: 17.0, SSIM: 0.17, GDT_TS: 3.7; Row 4: MAE: 15.2, SSIM: 0.23, GDT_TS: 8.5; Row 5: MAE: 18.4, SSIM: 0.10, GDT_TS: 5.2; Row 6: MAE: 15.4, SSIM: 0.26, GDT_TS: 7.8; Row 7: MAE: 16.2, SSIM: 0.18, GDT_TS: 5.2; Row 8: MAE: 20.1, SSIM: 0.03, GDT_TS: 4.3; Row 9: MAE: 14.8, SSIM: 0.31, GDT_TS: 5.9; Row 10: MAE: 17.1, SSIM: 0.19, GDT_TS: 4.8. Below these are three more rows: Row 11: MAE: 13.6, SSIM: 0.29, GDT_TS: 7.2; Row 12: MAE: 14.8, SSIM: 0.31, GDT_TS: 5.9; Row 13: MAE: 2.4, SSIM: 0.85, GDT_TS: 38.0. The final row shows the 'ground truth' and 'prediction' distance maps side-by-side.

Fig. 3.28 Input and outputs of the CGENN layers for the prediction of the 3D structure of protein 2ehwA for approach (e). The outputs (in red) are also 3D structures, with sensible distance maps close to ground truth (in grey).

proteins in fewer, more informative features. Moreover, we showed (i) how patterns in cost maps can be immediately associated to the protein secondary structures, which is what determines the overall folding and is one of the most relevant feature in PSP problems, and (ii) how cost maps are easily predictable compared to other features.

In the second part we applied CGENN layers to predict 3D structures of proteins. We showed how CGENN layers, which are \(E(3)\) -equivariant and explicitly work in 3D Euclidean space via \(\mathcal{G}_{3,0,0}\) , improve the quality of the predictions, reaching lower losses during training, boosting the accuracy measured as GDT scores of 3% compared to linear layers and offering a geometric interpretation of intermediate layers' outputs as 3D protein structures.

Chapter 4

3D Pose Estimation

Don't just stand there, let's get to it,
strike a pose — there's nothing to it.


— “Vogue”, Madonna (1990)

4.1 Introduction

In this chapter, we explore two key problems in 3D pose estimation: camera pose estimation and 3D line registration.

In Section 4.2, we introduce CGAPoseNet and its enhanced version CGAPoseNet+GCAN, two Geometric Algebra (GA)-based approaches for camera pose regression from RGB images. CGAPoseNet represents camera poses using motors, which unify rotations and translations into a single mathematical object, eliminating the need for complex loss tuning or 3D point cloud supervision. However, it only predicts motor coefficients without considering their geometric meaning. To address this, we incorporate the Geometric Clifford Algebra Network (GCAN), making CGAPoseNet+GCAN geometry-aware and capable of processing multivector-valued inputs directly in GA space. This results in state-of-the-art performance on 13 datasets while reducing the number of parameters and improving pose estimation accuracy.

In Section 4.3, we introduce Define, Refine, Align (DRA), a pipeline for aligning unordered 3D line sets without requiring prior correspondences. By leveraging GA in \(G(4, 0, 0)\) , DRA represents line bundles and transformations as multivectors, enabling direct pose estimation without the computational burden of conventional correspondence-based methods. DRA achieves up to 89.1% accuracy improvements

over baselines, demonstrating the potential of geometry-informed pipelines for robust 3D registration.

4.2 Camera Pose Estimation via CGAPoseNet+GCAN

4.2.1 Background

Camera pose regression is the process of estimating the 3D position and orientation (i.e. the pose) of a camera relative to a given object or scene. Camera pose regression has found application in augmented reality [92, 124, 168], object tracking [175, 66, 120], localization and mapping [48, 171, 13] and three-dimensional (3D) reconstruction [86, 4, 202].

Before deep learning (DL), camera pose regression was performed through traditional computer vision algorithms. These include: (i) feature-based algorithms, such as SIFT [119] or SURF [11], which extract the camera pose by matching features (edges, corners, ad hoc descriptors, etc.) across multiple views of the scene and triangulating them; (ii) iterative methods, such as Perspective-n-Point (PnP) [50] and Bundle Adjustment (BA) [173], that minimize an objective function such as the reprojection error between 2D points on the image and 3D points in space; (iii) structure from motion [63, 54], that jointly reconstructs the 3D geometry from multiple views and estimates the camera pose. These methods are generally very accurate, but they require special handling of outliers (e.g. through the RANSAC algorithm [50]), precisely crafted features and they generally fail under large viewpoint changes or in the presence of occlusions. One of the earliest examples of DL approaches to camera pose regression is found in [90], in which information about the scene is extracted directly from the RGB images with a convolutional neural network (CNN), which does not require hand-crafted feature descriptors. Despite the success of CNN approaches in the literature [14, 15, 198, 88, 89, 127] we believe that most pose regression problems via DL suffer from two major drawbacks: (i) they require a separate handling of rotation and translation components as they sit in two different mathematical spaces (ii) they are geometry-agnostic and ignore the structure of the scene being recorded, unlike more traditional computer vision approaches.

The GCAN, on the other hand, sits in the same mathematical space as the predictions (see Figure 4.1). The use of motors in CGAPoseNet solves the separate treatment of rotation and translation components, but CGAPoseNet, like most regression problems, has the limitation of predicting poses without knowledge about the geometry of

the scene. With CGAPoseNet+GCAN we let the backbone predict a set of proposals of motor coefficients (rather than a single motor as CGAPoseNet does) which are then transformed into actual motors and fed into the GCAN, which operates on them. This enables a geometry-aware approach in which inputs and outputs of the GCAN layers are also camera poses. This allows for better understanding of the geometry of the scene, better generalizability on previously unseen data and better interpretability of the intermediate layers' outputs.

Our CGAPoseNet+GCAN architecture significantly reduces the translation and rotation error with respect to both CGAPoseNet, from which we borrow the pose representation and the loss function, and PoseNet with geometric reprojection error loss, which is the best performing PoseNet strategy. Moreover, CGAPoseNet+GCAN adds only a few trainable parameter to the base PoseNet approach, but with a simpler MSE loss function to be minimized and no additional information about the 3D scene required (e.g., no 3D point cloud information is necessary), and it has about 4 million fewer parameters than CGAPoseNet.

4.2.2 Problem definition

Camera pose regression means predicting, in a supervised fashion, the camera pose \(\mathbf{p} \in \text{SE}(3)\) , with \(\text{SE}(3) \triangleq \{(\mathbf{R}, \mathbf{t}) : \mathbf{R} \in \text{SO}(3), \mathbf{t} \in \mathbb{R}^3\}\) for a given frame of a video capture of a scene. The translation component is generally represented as a 3D vector in \(\mathbb{R}^3\) . The rotation component, on the other hand, can be parametrized in multiple ways including rotation matrices, quaternions, Euler angles, axis-angles representations, rotors, bivectors and more.

The impact of the rotation representation in machine learning has been widely studied [192, 18, 147, 30]. The gimbal lock of Euler angles or the double coverage of quaternions, for example, negatively impact the regression quality. The discontinuity in the mapping from the rotation matrix \(\mathbf{R} \in \text{SO}(3)\) onto a given representation space has also been highlighted as a potential limiting factor [201, 146, 161].

Hence, in camera pose regression problems, two things have to be taken into account: (i) the choice of a rotation representation suitable for the learning algorithm and (ii) the weighting of the translation and rotation components. In [90], the rotational and the translational part are empirically weighted together as follows

\[\mathcal{L}_\beta = \mathcal{L}_t + \beta \mathcal{L}_q = \|\hat{\mathbf{t}} - \mathbf{t}\|_2 + \beta \left\| \hat{\mathbf{q}} - \frac{\mathbf{q}}{\|\mathbf{q}\|} \right\|_2 \quad (4.1)\]

in which \(\mathbf{q}\) is the rotation expressed as a quaternion and \(\beta\) is a weighting scalar. However, the choice of \(\beta\) is non trivial and a grid search is required. The optimal value was found to be “the ratio between expected error of position and orientation at the end of training”, which is not intuitive. Moreover, the value of \(\beta\) varies significantly for each dataset, even if the volumes spanned by the datasets are comparable. For example, for the indoor datasets which are all \(\leq 18\text{m}^3\) , the optimal \(\beta\) was found to be \(\beta \in [120, 750]\) .

A similar range of values of \(\beta\) is seen in Walch et al., who also used Equation 4.1 in [181]: the pretrained GoogLeNet is followed by LSTM modules with two different fully connected layers, one for the position and one for the orientation, as last layers.

The loss function of Equation 4.1 has also been employed in [47]. In it, a pretrained ResNet50 convolutional neural network is used to extract features for each image, which are then reshaped into graph form and input to a graph neural network (GNN) to predict position and orientation. This more complex architecture allowed Elmoogy et al. to be less strict on the choice of \(\beta\) and empirically fix \(\beta = 10\) for indoor scenes and \(\beta = 200\) for outdoor scenes.

A similar weighting has been proposed in [196]. Xu et al. employed 2D trajectories of pedestrians to estimate the camera pose rather than RGB images only, and found that the weight parameter does not have a significant impact on the regression accuracy.

Weighting the translational and rotational parts is hence heavily dependent on the kind of datasets available and the chosen architecture.

The authors of [90] proposed a more advanced weighting strategy in [89]:

\[\mathcal{L}_\sigma = \mathcal{L}_t \exp(-\hat{s}_t) + \hat{s}_t + \mathcal{L}_q \exp(-\hat{s}_q) + \hat{s}_q \quad (4.2)\]

with \(\hat{s} := \log(\hat{\sigma}^2)\) being a learned weight and \(\sigma^2\) the variance of the corresponding variable, either \(\mathbf{t}\) or \(\mathbf{q}\) , modelled through homoscedastic uncertainty. This probabilistic deep learning approach is superior to the \(\beta\) -weighting, but nonetheless still a weighting approach, with \(\sigma_q^2\) and \(\sigma_t^2\) to be learned and possibly differing from each other by several orders of magnitude.

Also in [89], a weighting-free approach is suggested: geometric reprojection error is used to combine the rotational and translational components into a single scalar loss. The geometric reprojection function \(\pi\) is introduced, that maps a 3D point \(\mathbf{g}\) to 2D image coordinates \((u, v)\) :

\[\pi(\mathbf{t}, \mathbf{q}, \mathbf{g}) \mapsto (u, v) \quad (4.3)\]

where \(\pi\) is defined via

\[(u', v', w') = \mathbf{K}(\mathbf{R}\mathbf{g} + \mathbf{t}) \quad (4.4)\]

with \((u, v) = (u'/w', v'/w')\) , \(\mathbf{K}\) the intrinsic camera calibration matrix and \(\mathbf{R} \in \text{SO}(3)\) the rotation matrix corresponding to \(\mathbf{q}\) . The proposed loss takes the norm of the reprojection error between the predicted and ground truth camera pose:

\[\mathcal{L}_{\mathbf{g}} = \frac{1}{|\mathcal{G}'|} \sum_{\mathbf{g}_i \in \mathcal{G}'} \|\pi(\mathbf{t}, \mathbf{q}, \mathbf{g}_i) - \pi(\hat{\mathbf{t}}, \hat{\mathbf{q}}, \mathbf{g}_i)\|_1 \quad (4.5)\]

in which \(\mathcal{G}'\) is the subset of all the points \(\mathbf{g}\) visible in image \(I\) . Despite the high accuracy of this approach, the amount of computation required at each learning iteration is significantly higher than that required by Equations 4.1-4.2. In addition, further discussion is needed to choose the most appropriate norm to be minimized.

The dichotomy between rotational and positional components is also present in works that adopt completely different regression strategies. This is the case in Chen et al., who suggested an ad-hoc parameterization in [31]: DirectionNet factorizes relative camera pose, specified by a 3D rotation and a translation direction, into a set of 3D direction vectors: the relative pose between two images, however, is still inferred in two steps for the rotation (DirectionNet-R) and the translation (DirectionNet-T) components.

Works like [31, 89] show that efforts in unifying the rotational and translational components correspond to better positional and rotational estimation. On the other hand, they are significantly less intuitive compared to the original PoseNet work [90]. In this work we wish to preserve the PoseNet pipeline, which is simple and successful, but also to avoid the rotational and translational weighting. We do so through a new mathematical representation for camera pose.

4.2.3 Method

4.2.4 Poses as motors in 1d-Up CGA

We represent poses in 1D-Up CGA \(\mathcal{G}_{4,0,0}\) (which we will refer to as \(\mathcal{G}_{4,0}\) ) [95–97]. CGA, on the other hand, extends a GA \(\mathcal{G}_{p,q}\) to \(\mathcal{G}_{p+1,q+1}\) , hence it requires 2 extra dimensions. \(\mathcal{G}_{4,0,0}\) has four basis vectors \(\{e_1, e_2, e_3, e_0\}\) , for which \(e_i^2 = +1 \forall i \in \{0, 1, 2, 3\}\) . The \(\mathcal{G}_{4,0}\) space has constant curvature \(\lambda\) and it represents a spherical geometry. While it may seem counter-intuitive at first, modelling the real world in spherical space allows for: (i) a Euclidean signature space, which is likely the main reason behind the speedy

convergence of the loss during training (ii) a representation for the pose with few parameters as only 1 extra dimension is needed.

A point \(x \in \mathcal{G}_{3,0}\) , is mapped to \(X \in \mathcal{G}_{4,0}\) through the function \(f : x \rightarrow X\)

\[X = f(x) = \left( \frac{2\lambda}{\lambda^2 + x^2} \right) x + \left( \frac{\lambda^2 - x^2}{\lambda^2 + x^2} \right) e_0. \quad (4.6)\]

It can be shown that translating and rotating in \(\mathcal{G}_{4,0}\) can both be done through rotors [97, 95]. Given a translation vector \(\mathbf{t} \in \mathcal{G}_{3,0}\) , its corresponding rotor in 4D spherical geometry is given by:

\[T = g(\mathbf{t}) = \frac{\lambda + \mathbf{t}e_0}{\sqrt{\lambda^2 + \mathbf{t}^2}} \quad (4.7)\]

A rotor \(R\) in 3D Euclidean geometry is still \(R\) in 4D spherical geometry. The rigid body motion, i.e. translation and rotation, of an object \(X\) into \(X'\) in the 1D-Up CGA can hence be expressed as the combination of two sandwich products :

\[X' = TRX\tilde{R}\tilde{T} = MX\tilde{M} \quad (4.8)\]

The geometric product \(M = TR\) yields a motor, which represents a rotation and a translation. Note how rotations and translations are now expressed in the same units. Motors are objects (multivectors) in \(\mathcal{G}_{4,0}\) with only even blades, presenting 1 scalar, 6 bivector and 1 quadrivector components:

\[M = \underbrace{x \cdot 1}_{\text{scalar}} + \underbrace{x_{12}e_{12} + x_{13}e_{13} + x_{01}e_{01} + x_{23}e_{23} + x_{02}e_{04} + x_{03}e_{03}}_{\text{bivector}} + \underbrace{x_{0123}e_{0123}}_{\text{quadrivector}} \quad (4.9)\]

Since motors combine translations and rotations, they can be employed as a pose representation with 8 parameters (i.e. the 8 coefficients). An object in 1D-Up CGA \(X\) can be projected back onto 3D space via:

\[x = f^{-1}(X) = \frac{\lambda}{1 + X \cdot e_0} [(X \cdot e_1)e_1 + (X \cdot e_2)e_2 + (X \cdot e_3)e_3] \quad (4.10)\]

2c41e286ea9a1b008d3beb348d86096f_img.jpg

The diagram illustrates the CGAPoseNet+GCAN architecture. On the left, 'Frames' are input into 'InceptionV3 backbone'. The backbone outputs 'Motor Coefficients Proposals' (ranging from 256 to \(\infty\) ), which are then 'Reshape'd. Below the backbone, 'Labels' are provided, consisting of 'Camera Position' ( \(r \in \mathbb{R}^3\) ), 'Camera Orientation' ( \(A \in \text{SO}(3)\) ), and 'Camera Pose' ( \(M \in \mathcal{G}_{4,0}\) , where \(M \leq \mathcal{G}_{4,0}\) and \(M \in \text{ID}^{4,0}(\text{GCAN})\) ). These labels are used to generate 'Geometric Sandwich Product' (* = Geometric Sandwich Product). The 'Motor Coefficients Proposals' are reshaped and then processed by 'Tensor to Geometric' conversion. The output is then processed by a series of 'GSP* Dense - relu' blocks (with 256, 128, and 64 units respectively). The final output is processed by 'GSP* Dense - tanh' and then 'Geometric to Tensor' conversion. The final output is 'Motor Coefficients (output)' (ranging from 1 to \(\infty\) ). The 'Geometric to Tensor' conversion includes a 'prediction' (blue shape) and 'ground truth' (red shape).

Diagram of the CGAPoseNet+GCAN architecture showing the flow from input frames to motor coefficient proposals and then to geometric tensor predictions.

Fig. 4.1 The CGAPoseNet+GCAN architecture. The output of the InceptionV3 network is reshaped to obtain a set of motor coefficient proposals. Motors are objects in the 1D-Up Algebra \(\mathcal{G}_{4,0}\) with scalar, bivector and quadvector parts, giving a total of 8 real coefficients. These coefficients are used to build motors \(\in \mathcal{G}_{4,0}\) as input to the Geometric Clifford Algebra Network (GCAN). A motor represents a rotation and a translation, and it is hence a suitable representation for camera poses. The GCAN works in \(\mathcal{G}_{4,0}\) space and has weights, biases and outputs that are also motors, and hence interpretable as poses. The GCAN narrows down the proposals to a single motor through a geometric understanding of the scene.

4.2.5 Architecture: InceptionV3 + sandwich product layers

The key element in our approach is the GCAN added at the output of the backbone. We call our architecture CGAPoseNet+GCAN because, like in CGAPoseNet, we also represent poses with motors in 1D-Up CGA, which unify translation and rotation with a single object that sits in a space with Euclidean signature, and we keep the mean squared error (MSE) function as a loss to guide the training.

CGAPoseNet, however, is not really working in 1D-Up CGA space as it only learns to predict poses expressed as motors coefficients. It does so based on patterns in the data, without understanding the poses' geometrical meaning or learning how to perform geometrical transformations on them. We believe that this is a key limitation of the CGAPoseNet approach, that explains why it does not significantly surpass PoseNet paired with geometric reprojection error, that includes information about 3D points of the scene in its loss function and therefore is a proper geometry-aware approach.

We modify CGAPoseNet by reshaping the penultimate layer of the backbone from 2048 into \(256 \times 8\) . We refer to this output as motor proposals, since the backbone now predicts 256 sets of 8 motor coefficients rather than a single set of motor coefficients as in CGAPoseNet. Proposals are then employed to build motors \(M_i\) and fed in as input to the GCAN. The GCAN explicitly works in \(\mathcal{G}_{4,0}\) and it consists of 3 sandwich product dense layers, whose outputs obey

\[h(M) = \sum_{i=1}^c \left( W_i M_i \tilde{W}_i + B_i \right) \quad (4.11)\]

where \(c\) is the number of channels, \(M = \{M_i\}_{i=1}^c\) is the set of motors per channel, \(W_i\) are the weights and \(B_i\) the biases. Note that we employ the uppercase notation since \(W_i, M_i, B_i \in \mathcal{G}_{4,0}\) and all of them only contain even blades. This means that (i) each neuron in the layer encodes a geometric transformation of its input, preserving the grade of the objects as described in Section 2.1, and hence (ii) each output of the GCAN layers is also a (unnormalized) motor in 1D-Up CGA. The GCAN layers have 128, 64 and 1 neurons, respectively: the 256 proposals are progressively downsampled until the optimal pose is found (see Figure 4.1). A pipeline with 128-64-32-1 neurons has also been tested, without significant difference.

We also slightly adapted the backbone in order to reduce the number of trainable parameters (see Table 4.1). CGAPoseNet adds two dense layers to the backbone, with 2048 and 8 neurons, respectively, without removing the last classification layer of InceptionV3, that has 1000 neurons (see Figure 4.2). This bottleneck significantly increases

Table 4.1 Number of trainable parameters for the three approaches.

- PoseNet CGAPoseNet CGAPoseNet
+GCAN
Parameters 21,782,695 25,918,224 22,132,520

the number of parameters. In CGAPoseNet+GCAN, we remove the classification layer with 1000 neurons and instead reshape the 2048 outputs that precede the classification layer.

23aceb0e7d3c1a644294899d9047df05_img.jpg

Diagram of the original CGAPoseNet pipeline. The input consists of 'Frames' (represented by three green rectangles). These are processed by an 'InceptionV3 backbone' (yellow block). The output is split into two branches: one for 'Labels' (Camera Position t in R^3 and Camera Orientation R in SO(3)) and another for 'Motor Coefficients (output)' (red text). The 'Labels' branch uses a '1D-Up CGA' layer (blue block) with M in V_4,0. The 'Motor Coefficients' branch uses a '2048' layer (blue block) followed by a '1000' layer (blue block) and a '2048' layer (blue block).

Fig. 4.2 The original CGAPoseNet pipeline, which is geometry-agnostic and adds a significant number of parameters compared to baseline PoseNet.

4.2.6 Experiments

4.2.7 Datasets: Cambridge Landmarks and 7 Scenes

We followed [90, 88, 89, 140] and tested our approach on datasets of both indoor and outdoor scenes, for a total of 13 different datasets. The indoor dataset, 7 Scenes, was introduced in [163] and it includes Chess, Fire, Heads, Office, Pumpkin, Red Kitchen and Stairs datasets. The videos have been recorded with a Kinect RGB-D sensor and they all span a volume of less than \(20m^3\) .

The outdoor dataset, Cambridge Landmarks, was first presented along with PoseNet in [90]. It includes 6 datasets (Great Court, King’s College, Old Hospital, Street, Shop Facade, St. Mary’s Church and Street). The variability between each scene is significant, with Shop Facade spanning an area of \(875m^2\) and Street covering \(50000m^2\) . The similarity between train and test set is also variable.

Each dataset includes RGB images extracted from the scene and labels of the position \(\mathbf{t} = [x, y, z]\) and orientation \(\mathbf{R}\) of the camera, expressed as either rotation matrices or quaternions, given an arbitrary reference frame. We converted labels into

motors \(M\) and picked the curvature of the space \(\lambda\) (see Equation 4.7) to be proportional to the area spanned by the scene as described in [140].

4.2.8 Error metrics

Given a predicted motor \(\hat{M}\) and a ground truth motor \(M\) , we measure the quality of the predicted pose by decomposing motors into their rotation and translation components and measure (i) translation error and (ii) rotation error.

We followed the procedure described in [140] and decomposed the motor \(M\) into a translation vector \(\mathbf{t} \in \mathbb{R}^3\) , the translation component, and into a rotor \(R \in \mathcal{G}_{3,0}\) , the rotation component.

We define the translation error between original position \(\mathbf{t}\) and predicted position \(\hat{\mathbf{t}}\) as:

\[\epsilon_t = \|\hat{\mathbf{t}} - \mathbf{t}\|_1 \quad (4.12)\]

in a similar way to that described in [89, 91, 140]. The rotation error between a ground truth rotor \(R\) and predicted rotor \(\hat{R}\) has been derived from [146, 201] and consistent with [140]. It is defined as:

\[\epsilon_\rho = 2 \arccos(\langle R\hat{R} \rangle_0) \quad (4.13)\]

where \(\langle \cdot \rangle_0\) denotes the component with grade 0, i.e. the scalar part of the geometric product. Since \(R\hat{R} = 1\) , when \(\hat{R}\) is close to \(R\) the error goes to \(0^\circ\) .

4.2.9 Implementation details

CGAPoseNet+GCAN has been trained in a supervised fashion with only RGB images \(I\) as inputs and camera poses expressed as motors \(M\) as labels. Weights are initialized starting from ImageNet [41, 90]. We employed an 80-20 train-validation split, a batch size of \(B = 64\) and a number of epochs \(E = 100\) . Adam has been chosen as optimizer with exponentially decaying learning rate, with initial value \(\eta = 10^{-4}\) and decay rate of 0.98. The rate of decay has been adjusted based on the training set dimension. To avoid overfitting, we implemented early stopping with patience \(P = 12\) and restored the best weights based on the validation loss.

The training procedure adopted differs slightly between indoor and outdoor datasets. For Cambridge Landmarks, the network has been re-trained twice with decreasing starting learning rate, namely \(\eta = \{10^{-4}, 10^{-5}, 10^{-6}\}\) and keeping the weights from the last training. For 7 Scenes the network has been trained once.

44229cd77a85081f3968b4bf05d232d3_img.jpg

Plot (a) shows the training (solid line) and validation (dashed line) losses (MSE) for the Chess dataset. The x-axis represents Epochs (0 to 60) and the y-axis represents Loss (MSE) (0.000 to 0.005). The legend indicates three models: CGAPoseNet+GCAN (red), CGAPoseNet (blue), and CGAPoseNet (no bottleneck) (green). CGAPoseNet+GCAN achieves the lowest loss, stabilizing near 0.000 MSE. CGAPoseNet stabilizes around 0.001 MSE, and CGAPoseNet (no bottleneck) stabilizes around 0.002 MSE.

Plot (a) Chess: Training and validation losses (MSE) vs Epochs for Chess dataset. The plot shows three models: CGAPoseNet+GCAN (red), CGAPoseNet (blue), and CGAPoseNet (no bottleneck) (green). CGAPoseNet+GCAN achieves the lowest loss, stabilizing near 0.000 MSE after 10 epochs. CGAPoseNet stabilizes around 0.001 MSE, and CGAPoseNet (no bottleneck) stabilizes around 0.002 MSE.

(a) Chess73e9791d33fc9c524278fb216945435f_img.jpg

Plot (b) shows the training (solid line) and validation (dashed line) losses (MSE) for the Fire dataset. The x-axis represents Epochs (0 to 60) and the y-axis represents Loss (MSE) (0.0000 to 0.0175). The legend indicates three models: CGAPoseNet+GCAN (red), CGAPoseNet (blue), and CGAPoseNet (no bottleneck) (green). CGAPoseNet+GCAN and CGAPoseNet achieve the lowest loss, stabilizing near 0.0000 MSE. CGAPoseNet (no bottleneck) stabilizes around 0.006 MSE.

Plot (b) Fire: Training and validation losses (MSE) vs Epochs for Fire dataset. The plot shows three models: CGAPoseNet+GCAN (red), CGAPoseNet (blue), and CGAPoseNet (no bottleneck) (green). CGAPoseNet+GCAN and CGAPoseNet both achieve very low losses, stabilizing near 0.0000 MSE. CGAPoseNet (no bottleneck) stabilizes around 0.006 MSE.

(b) Fire145582dc8830099a0aa6ca1234f0acd1_img.jpg

Plot (c) shows the training (solid line) and validation (dashed line) losses (MSE) for the Heads dataset. The x-axis represents Epochs (0 to 30) and the y-axis represents Loss (MSE) (0.0000 to 0.020). The legend indicates three models: CGAPoseNet+GCAN (red), CGAPoseNet (blue), and CGAPoseNet (no bottleneck) (green). CGAPoseNet+GCAN and CGAPoseNet achieve the lowest loss, stabilizing near 0.0000 MSE. CGAPoseNet (no bottleneck) stabilizes around 0.005 MSE.

Plot (c) Heads: Training and validation losses (MSE) vs Epochs for Heads dataset. The plot shows three models: CGAPoseNet+GCAN (red), CGAPoseNet (blue), and CGAPoseNet (no bottleneck) (green). CGAPoseNet+GCAN and CGAPoseNet both achieve very low losses, stabilizing near 0.0000 MSE. CGAPoseNet (no bottleneck) stabilizes around 0.005 MSE.

(c) Heads

Fig. 4.3 Train (solid line) and validation (dashed line) losses for selected datasets. CGAPoseNet+GCAN attains the lowest loss profile due to its Geometric Clifford Algebra layers.

The loss we minimize is

\[\mathcal{L} = \text{MSE}(M, \hat{M}) \quad (4.14)\]

where \(\hat{M}\) and \(M\) are the predicted and ground truth motors, as in [140]. The training time does not show noticeable difference with respect to the simple CGAPoseNet, both measured to be around 4s/step.

The code has been run on a NVIDIA Tesla T4 GPU at 1.59 GHz. The backbone architecture has been implemented via the Keras API of TensorFlow, while the GCAN has been implemented via the TensorFlow Geometric Algebra library [82]. Operations in Geometric Algebra have been handled through Clifford [61].

4.2.10 Results

Table 4.2 Median translation and rotation errors over the test set for the 7 approaches.

Scene PoseNet [90] Bayesian PoseNet [88] PoseNet LSTM [181] PoseNet \(\sigma^2\) Weights [89] PoseNet Geom. Repr. [89] CGA-PoseNet CGA-PoseNet+GCAN
Great Court - - - 7.00m, 3.65° 6.83m, 3.47° 3.77m, 4.27° 3.88m, 3.21°
King's 1.92m, 5.40° 1.74m, 4.06° 0.99m, 3.65° 0.99m, 1.06° 0.88m, 1.04° 1.36m, 1.85° 1.00m, 1.16°
Old Hospital 2.31m, 5.38° 2.57m, 5.14° 1.51m, 4.29° 2.17m, 2.94° 3.20m, 3.29° 2.52m, 2.90° 1.79m, 2.28°
Shop 1.46m, 8.08° 1.25m, 7.54° 1.18m, 7.44° 1.05m, 3.97° 0.88m, 3.78° 0.74m, 3.84° 1.19m, 3.43°
St. Mary's 2.65m, 8.48° 2.11m, 8.38° 1.52m, 6.68° 1.49m, 3.43° 1.57m, 3.32° 2.12m, 2.97° 1.60m, 2.94°
Street - - - 20.74m, 25.5° 20.34m, 25.5° 19.6m, 19.9° 19.0m, 19.4°
Chess 0.32m, 6.60° 0.37m, 7.24° 0.24m, 5.77° 0.24m, 5.77° 0.13m, 4.48° 0.26m, 6.34° 0.10m, 3.58°
Fire 0.47m, 14.0° 0.43m, 13.7° 0.34m, 11.9° 0.27m, 11.8° 0.27m, 11.3° 0.28m, 10.3° 0.15m, 6.30°
Heads 0.30m, 12.2° 0.31m, 12.0° 0.21m, 13.7° 0.18m, 12.1° 0.17m, 13.0° 0.17m, 7.98° 0.12m, 8.15°
Office 0.48m, 7.24° 0.48m, 8.04° 0.30m, 8.08° 0.20m, 5.77° 0.19m, 5.55° 0.26m, 7.23° 0.14m, 3.11°
Pumpkin 0.49m, 8.12° 0.61m, 7.08° 0.33m, 7.00° 0.25m, 4.82° 0.26m, 4.75° 0.22m, 5.18° 0.17m, 3.84°
Red Kitchen 0.58m, 8.34° 0.58m, 7.54° 0.37m, 8.83° 0.24m, 5.52° 0.23m, 5.35° 0.55m, 16.7° 0.15m, 3.76°
Stairs 0.48m, 13.1° 0.48m, 13.1° 0.40m, 13.7° 0.37m, 10.6° 0.35m, 12.4° 0.17m, 12.0° 0.19m, 8.30°

Results are summarized in Table 4.2. We report median translation and rotation errors, consistently with [90, 88, 89, 91, 140], for 7 different approaches, namely 5 PoseNet approaches with different loss functions, CGAPoseNet and CGAPoseNet+GCAN (ours). Our approach significantly reduces both errors by predicting a single mathematical object, the motor, through a geometry-aware network, presenting the lowest rotation error on 11 out of 13 datasets and the lowest translation error on 8 out of 13 datasets. For the 7 Scenes dataset, even mean errors via CGAPoseNet+GCAN are below the others reported in Table 4.2.

To verify that the improvement comes indeed from the GCAN layers, we report training and validation losses in Figure 4.3. We compare CGAPoseNet, CGAPoseNet+GCAN and CGAPoseNet without bottleneck, i.e. by removing the last classification layer in the backbone and only adding one dense layer with 8 neurons. We do this so that CGAPoseNet without bottleneck and CGAPoseNet+GCAN

111c2953ce033bb5d26aaa0b071e8a7f_img.jpg

The figure consists of three 3D scatter plots, each representing the translation components (X, Y, Z) of camera poses for a specific dataset. The plots are arranged as follows:

  • (a) King's:** The X-axis ranges from -40 to 100, the Y-axis from -50 to -10, and the Z-axis from -30 to 100. The data points form a dense, elongated cloud. The legend indicates: red 'x' for test T, green dots for predicted T (CGAPoseNet), and blue plus signs for predicted T (CGAPoseNet+GCAN).
  • (b) Shop:** The X-axis ranges from -15.0 to 0.0, the Y-axis from -15 to 20, and the Z-axis from -10 to 15. The data points are more scattered than in (a). The legend is the same as in (a).
  • (c) Pumpkin:** The X-axis ranges from -0.6 to 0.6, the Y-axis from -1.0 to -0.3, and the Z-axis from -0.6 to 0.3. The data points form a distinct, winding path. The legend is the same as in (a).

Three 3D scatter plots (a, b, c) showing translation components X, Y, Z for datasets King's, Shop, and Pumpkin. Each plot compares ground truth (red 'x') with two predicted models (green dots and blue plus signs).

Fig. 4.4 Ground truth and predicted translation component of the pose over the test set for selected datasets.

98f864103f205c3eeb1f0574f26e0b56_img.jpg

A 3D scatter plot for the Pumpkin dataset. The axes are labeled \(e_{12}\) (vertical, ranging from 0.00 to 0.15), \(e_{13}\) (horizontal, ranging from -0.2 to 0.3), and \(e_{23}\) (depth, ranging from -0.10 to 0.15). The plot compares ground truth data (red 'x' marks) with two prediction models: CGAPoseNet (green dots) and CGAPoseNet+GCAN (blue dots). The ground truth points form a dense, elongated cluster. The CGAPoseNet predictions are more scattered, while the CGAPoseNet+GCAN predictions are more tightly clustered around the ground truth.

3D scatter plot for Pumpkin dataset showing rotation components e12, e13, and e23.

(a) Pumpkin4c20a034ca2df4c039226701035aeecd_img.jpg

A 3D scatter plot for the Great Court dataset. The axes are labeled \(e_{12}\) (vertical, ranging from -0.8 to 0.8), \(e_{13}\) (horizontal, ranging from -0.1 to 0.8), and \(e_{23}\) (depth, ranging from -0.8 to 0.8). The plot compares ground truth data (red 'x' marks) with two prediction models: CGAPoseNet (green dots) and CGAPoseNet+GCAN (blue dots). The ground truth points form a dense, elongated cluster. The CGAPoseNet predictions are more scattered, while the CGAPoseNet+GCAN predictions are more tightly clustered around the ground truth.

3D scatter plot for Great Court dataset showing rotation components e12, e13, and e23.

(b) Great Courta7a4135af81158a102df055b2138f981_img.jpg

A 3D scatter plot for the Red Kitchen dataset. The axes are labeled \(e_{12}\) (vertical, ranging from -0.15 to 0.20), \(e_{13}\) (horizontal, ranging from -0.4 to 0.3), and \(e_{23}\) (depth, ranging from -0.4 to 0.4). The plot compares ground truth data (red 'x' marks) with two prediction models: CGAPoseNet (green dots) and CGAPoseNet+GCAN (blue dots). The ground truth points form a dense, elongated cluster. The CGAPoseNet predictions are more scattered, while the CGAPoseNet+GCAN predictions are more tightly clustered around the ground truth.

3D scatter plot for Red Kitchen dataset showing rotation components e12, e13, and e23.

(c) Red Kitchen

Fig. 4.5 Ground truth and predicted rotation component of the pose over the test set for selected datasets.

188a44725b154c1f37855797d6a174f4_img.jpg

Figure 4.6: Translation error over the test set for selected datasets. The figure consists of four subplots arranged in a 2x2 grid. The top row shows the Probability Density Function (PDF) of translation error in meters, and the bottom row shows the Cumulative Distribution Function (CDF). The left column (a) is for the Red Kitchen dataset, and the right column (b) is for the Old Hospital dataset. In all plots, the blue dashed line represents CGAPoseNet+GCAN and the orange dashed line represents CGAPoseNet. In the Red Kitchen dataset, CGAPoseNet+GCAN shows a much higher peak at low error (0-0.1m) and a lower overall error distribution compared to CGAPoseNet. In the Old Hospital dataset, both models show more similar distributions, with CGAPoseNet+GCAN still generally performing better.

(a) PDF (top) and CDF (bottom) for the Red Kitchen dataset

(b) PDF (top) and CDF (bottom) for the Old Hospital dataset

Fig. 4.6 Translation error over the test set for selected datasets.

2e8dd34433bac31ce3d0d503abd0210e_img.jpg

Figure 4.7: Rotation error over the test set for selected datasets. The figure consists of four subplots arranged in a 2x2 grid. The top row shows the Probability Density Function (PDF) of rotation error in degrees, and the bottom row shows the Cumulative Distribution Function (CDF). The left column (a) is for the Red Kitchen dataset, and the right column (b) is for the Old Hospital dataset. In all plots, the blue dashed line represents CGAPoseNet+GCAN and the orange dashed line represents CGAPoseNet. In the Red Kitchen dataset, CGAPoseNet+GCAN shows a very high peak at low error (0-5 degrees) and a lower overall error distribution compared to CGAPoseNet. In the Old Hospital dataset, both models show more similar distributions, with CGAPoseNet+GCAN still generally performing better.

(a) PDF (top) and CDF (bottom) for the Red Kitchen dataset

(b) PDF (top) and CDF (bottom) for the Old Hospital dataset

Fig. 4.7 Rotation error over the test set for selected datasets.

share the same backbone structure. It can be seen how the validation loss for CGAPoseNet+GCAN is lower than the other two approaches despite similar training loss profiles, showing (i) that our network generalizes better and (ii) that this is due to the GCAN layers and not to the backbone structure. CGAPoseNet without bottleneck performs worse than standard CGAPoseNet, justifying its structure.

In Figures 4.4-4.5 we display ground truth and predicted translation and rotation components, respectively, after breaking \(M\) down into \(\mathbf{t} \in \mathbb{R}^3\) and \(R \in \mathcal{G}_{4,0}\) . In Figure 4.5 we plot the bivector components of the rotor \(R\) . CGAPoseNet+GCAN shows more accurate predictions, especially on previously unseen areas of the scene compared to CGAPoseNet (see Figure 4.5a). The improvement on the rotation component is less evident for outdoor datasets (see Figure 4.5b), but much more visible on indoor datasets (see Figure 4.5a, 4.5c).

A comparison of the error distributions of the predictions via CGAPoseNet and CGAPoseNet+GCAN is given in Figures 4.6-4.7 for the translation and the rotation error, respectively: both errors are noticeably reduced with our geometry-aware approach in terms of both probability density function (PDF) and cumulative density function (CDF).

4.2.11 Geometric interpretation of intermediate outputs

77c2cc09f561ea80b58cfeede4428d58_img.jpg

The figure consists of two 3D scatter plots. The left plot is a top-down view of the x-y plane, with the x-axis ranging from -100 to 100 and the y-axis from -100 to 100. The right plot is a side view of the y-z plane, with the y-axis from -50 to 100 and the z-axis from -100 to 50. Both plots show a dense cluster of yellow triangles (Motor proposals) and a more concentrated cluster of green, pink, and blue triangles (Outputs #1, #2, and #3 respectively) centered around a red square (Ground Truth). The outputs are more tightly clustered than the motor proposals.

Figure 4.8: Two 3D scatter plots showing motor proposals and intermediate outputs. The left plot shows a top-down view (x-y plane) and the right plot shows a side view (y-z plane). The legend indicates: Motor proposals (256) in yellow, Output #1 (128) in green, Output #2 (64) in pink, Output #3 (1) in blue, and Ground Truth in red. The outputs are clustered around the ground truth pose.

Fig. 4.8 Two views of input and output poses of the GCAN layers for a test image in the Old Hospital dataset (relu activation).

We visualize the outputs of the GCAN layer in Figure 4.8. As the GCAN works exclusively with motors, it is possible to interpret the intermediate layer outputs from a geometrical point of view as poses. The motor proposals (in yellow) are

d38e0cdebe2c60996d3f678bf3d207a6_img.jpg

Figure 4.9 displays two 3D scatter plots illustrating the distribution of camera poses. The left plot shows the initial motor proposals (256 points, yellow) and the outputs from three GCAN layers: Output #1 (128 points, green), Output #2 (64 points, pink), and Output #3 (1 point, blue). The right plot adds the Ground Truth (1 point, red) to the distribution. Both plots show the poses distributed on a sphere, with the outputs converging towards the ground truth as the number of layers increases.

Figure 4.9: Two 3D scatter plots showing input and output poses. The left plot shows Motor proposals (256), Output #1 (128), Output #2 (64), and Output #3 (1). The right plot shows the same outputs plus Ground Truth. The poses are distributed on a sphere, with the outputs converging towards the ground truth.

Fig. 4.9 Two views of input and output poses of the GCAN layers for a test image in the Old Hospital dataset (tanh activation).

downsampled into progressively fewer poses until converging to the final prediction. Note the difference in scale (as also shown in Figure 4.1) between the outputs and how they cover progressively smaller areas. This explains why results are superior on the 7 Scenes dataset, since the volume of Euclidean space to cover is significantly smaller. The curvature of the poses shows that we are working in the spherical space \(\mathcal{G}_{4,0}\) .

It is worth mentioning that the choice of the activation function influences the area in which poses are distributed, hence affecting how convergence is reached. In Figure 4.9, for example, outputs obtained with a tanh activation function are presented. Predicted poses occupy a hemisphere in Figure 4.8, but they cover a full sphere in Figure 4.9: the interpretability of the GCANs intermediate outputs allows us to design networks that minimize the loss function via different paths, meaning that we can customize the network prediction strategy based on the geometry of problem that we are trying to solve.

In Figure 4.10 the average pose in input and output of each GCAN layer is shown. We plot average poses with and without an additional layer with 32 neurons (output coloured in cyan). As geometric transformations are applied to the input poses by the sandwich product dense layers, the average pose converges towards ground truth, following clear traces in 3D space.

Lastly, an ablation study with different backbones has been performed, and results are reported in Table 4.3. Regardless of the backbone employed, CGAPoseNet+GCAN still outperforms the best performing PoseNet strategy in most indoor cases. For outdoor cases, we believe that backbones whose outputs are reshaped into more

a8aee5eac3467842490a713c009925f7_img.jpg

Figure 4.10: Average input and output poses of the GCAN layers for a test image in the Old Hospital dataset. The figure consists of two 3D plots. The left plot shows the input pose with a dashed red line representing the path and colored triangles representing the input pose. The right plot shows the output pose with a dashed red line representing the path and colored triangles representing the output pose. Both plots have x, y, and z axes.

Fig. 4.10 Average input and output poses of the GCAN layers for a test image in the Old Hospital dataset (tanh activation) with 3 (left) and 4 (right) sandwich product layers in the GCAN.

Table 4.3 Ablation study with different backbones for selected dataset. Results superior to the best PoseNet strategy are in bold.

Scene InceptionV3 VGG16 [165] VGG19 [165] ResNet50 [65] ResNetV250 [65] Xception [32] DenseNet121 [78] MobileNetV3 [77] EfficientNetB0 [169]
Old Hospital 1.79m, 2.28 3.08m, 3.10 1.93m, 1.70 13.11m, 13.96 10.80m, 3.75 2.21m, 3.11 1.98m, 2.12 12.14m, 11.10 1.96m, 2.13
Shop 1.16m, 3.43 5.36m, 15.60 4.87m, 14.01 5.70m, 23.1 5.20m, 6.20 1.23m, 3.53 3.12m, 2.49 7.72m, 18.3 4.15m, 6.23
St. Mary’s 1.60m, 2.94 2.78m, 4.80 2.45m, 4.68 1.72m, 3.28 1.40m, 3.41 1.95m, 4.41 2.01m, 3.95 16.8m, 31.6 6.24m, 7.28
Chess 0.10m, 3.58 0.11m, 2.67 0.13m, 3.69 0.07m, 2.66 0.10m, 3.41 0.095m, 3.28 0.080m, 2.62 0.39m, 15.1 0.39m, 15.1
Fire 0.15m, 6.30 0.28m, 8.69 0.14m, 9.29 0.39m, 20.1 0.22m, 6.88 0.21m, 6.62 0.16m, 6.62 0.46m, 33.6 0.21m, 7.69
Heads 0.12m, 8.15 0.20m, 11.9 0.20m, 10.4 0.29m, 15.8 0.25m, 14.1 0.22m, 12.9 0.14m, 8.90 0.25m, 12.5 0.17m, 9.33
Pumpkin 0.17m, 3.84 0.22m, 3.74 0.22m, 4.67 0.10m, 2.78 0.14m, 4.01 0.18m, 4.25 0.13m, 3.48 0.22m, 4.56 0.13m, 3.58
Red Kitchen 0.15m, 3.76 0.02m, 0.45 0.02m, 0.43 0.01m, 0.45 0.02m, 0.41 0.02m, 0.53 0.02m, 0.60 0.70m, 18.61 0.08m, 1.92
Stairs 0.19m, 8.30 0.22m, 5.73 0.26m, 5.41 0.57m, 8.37 0.25m, 6.65 0.19m, 6.32 0.22m, 7.06 0.55m, 9.15 0.57m, 6.67

proposals need to be paired with GCAN layers with more units, to avoid bottlenecks from fast downsampling and to make sure that the volume covered by the scene is thoroughly explored.

4.3 3D Line Alignment with Define, Refine, Align

Three-dimensional (3D) lines are rich and robust descriptors that can be used to represent a 3D scene. As opposed to textureless and sparse 3D point clouds, 3D lines preserve structure by capturing edges and contours [130, 74, 117], they require less storage space [129, 93, 85, 94] and are less sensitive to small noise perturbation [56, 117], making them an ideal means to describe a 3D space.

Despite progress in 3D point cloud registration [152, 183, 194, 195, 133, 182], 3D line registration remains underexplored [116, 80, 189]. Lines are harder to represent (e.g., via Plücker coordinates or bivectors), and their geometric relationships are more complex than those of points. Additionally, the lack of annotated benchmark datasets for 3D lines has hindered research in this area.

Currently, no end-to-end pipeline can directly and robustly register two partially overlapping sets of 3D lines. Most methods rely on offline RANSAC, learning only correspondences rather than directly estimating pose, making them unsuitable for real-time applications. A direct relative pose estimator is needed.

Estimating pose without correspondences, in fact, enables real-time operation, avoiding iterative optimization overheads of up to 760 ms (see Tables V, VII in [80]). It mitigates sensitivity to initialization and local minima [7, 187] while handling outliers and incomplete data by leveraging global context. End-to-end learning captures holistic spatial relationships, improving accuracy in cluttered environments [105, 80].

In this section, we wish to fill the gap in the literature on 3D line alignment by introducing Define, Refine, Align (DRA): DRA estimates relative poses between sets of lines directly without evaluating correspondences between them explicitly. DRA is a lightweight pipeline, grounded in geometry, which combines (i) the robust features extracted with PlückerNet [116], (ii) the descriptive power of Clifford/Geometric Algebra [176, 97, 101] and (iii) the effectiveness of Hypercomplex Neural Networks in Geometric Algebra [135, 159, 17] in representing and processing geometrical data, particularly in pose estimation tasks [140, 142].

Among the contributions of DRA, we highlight:

  • real-time relative pose estimation between sets of 3D lines;
  • an intuitive pipeline that performs precise geometrical operations via equivariant and rotational layers;
  • end-to-end interpretability due to the hypercomplex nature of the network;
  • smallest number of trainable parameters compared to all other models in the literature;
  • lowest translational and rotational errors on both indoor and outdoor scenes compared to all correspondence-free architectures reported;
  • lowest mean translational and rotational error on both indoor and outdoor scenes compared to the iterative closest line (ICL) algorithm;
  • lowest mean translational error on outdoor scenes compared to most correspondence-based models.

4.3.1 Related work on partial-to-partial registration

By 3D line alignment we refer to the estimation of the relative pose between two sets of lines in 3D space. It is a task required in many computer vision and robotics applications, including 3D reconstruction [9, 73, 94], camera pose estimation [118, 10, 106], object detection [132], place recognition [172] and SLAM [53].

Early algorithms for 3D line alignment, like the adapted Iterative Closest Point algorithm (ICP) [28] and the Iterative Closest Line algorithm (ICL) [7], establish correspondences by minimizing the distance between lines, defined as the perpendicular distance from a point on a source line to a target line. Such techniques, however, are sensitive to noise and outliers, especially in realistic scenarios with partial overlap between line bundles, which belong to the category of partial-to-partial registration tasks.

Hence, most deep learning approaches to the 3D line correspondence problem have developed in this direction, focusing on estimating correspondences between partially overlapping sets of lines, and then, at a later stage, performing an iterative optimisation on the subset of corresponding lines: we will refer to these methods as correspondence-based methods.

PlückerNet [116] has been the first end-to-end neural network for solving 3D line alignment starting from partial reconstruction. It is a pipeline that operates on lines represented via Plücker coordinates, and it combines self- and cross-attentional layers for feature extraction, a Sinkhorn algorithm to evaluate the correspondence matrix \(\mathbf{C}\) and a RANSAC solver to establish correspondences.

More recently, MEGNet [170] has been proposed. MEGNet enhances the Feature Extraction module of PlückerNet by pairing the attentional feature extractor with a geometric one to produce an existence probability vector for the two line bundles. Like PlückerNet, it includes a Sinkhorn algorithm and a RANSAC solver. Despite having more than twice the number of parameters of PlückerNet, however, MEGNet presents only mild numerical improvement and limited novelty over it.

Other approaches have been adapted from partial-to-partial registration tasks on different data types. RORNet [189], for example, designed to find overlapping representations of partially overlapping point clouds, has been employed in [170] to operate on partially overlapping Plücker coordinates instead.

These correspondence-based methods share some common traits: (i) they estimate matches rather than poses; (ii) they happen in two steps, with alignment performed offline after the subsampling of the line sets; (iii) due to the iterative nature of the optimisation, they are not suitable for real-time operation. Moreover, all of these

models combine very diverse techniques with little geometric intuition behind their joint use (e.g. optimal transport, PointNet-inspired backbones, knowledge distillation, graph representation, self- and cross-attention, Sinkhorn algorithm, etc.).

We believe that a pipeline grounded in the geometry of the problem can significantly simplify the problem and directly align sets of lines. To this end, a promising research direction for robust partial-to-partial registration is represented by Hypercomplex Networks.

4.3.2 Method

Problem formulation. We wish to align a bundle (i.e. unordered set) of \(N_S\) lines in the source frame \(\mathcal{L}_S = \{\ell_i\}_{i=1}^{N_S}\) to a bundle in the target frame \(\mathcal{L}_T = \{\ell'_j\}_{j=1}^{N_T}\) . We do so by working in the spherical space \(G(4, 0, 0)\) , also known as the 1d-Up CGA, and regressing the motor \(\hat{M}\) that rotates and translates \(\mathcal{L}_S\) into \(\mathcal{L}_T\) . We wish to leave the two line bundles unregistered and directly estimate the rigid body motion \(\hat{M}\) . In formulae:

\[\hat{M} = \arg \min_{M \in G(4,0,0)} \sum_{i,j} \mathbf{C}_{ij} \mathcal{D} \left( \ell'_j; M \ell_i \tilde{M} \right) \quad (4.15)\]

In which \(\mathcal{D}\) is a distance metric and \(\mathbf{C}_{ij} \in \mathbb{B}^{N_S \times N_T}\) is a binary correspondence matrix to inform us about which lines in the source frame are present in the target frame. \(\mathbf{C}_{ij}\) is usually learnt (see [116, 189, 80]), used to retain only a subset of lines in each of the bundles, and then the rigid body motion that aligns corresponding lines in \(\mathcal{L}_T\) to \(\mathcal{L}_S\) is estimated offline via algorithms like RANSAC.

With Define, Refine, Align, we do not regress \(\mathbf{C}_{ij}\) explicitly, but rather we estimate \(\hat{M}\) directly in a supervised fashion, given the two line bundles as inputs, i.e.:

\[\hat{M} = \mathbf{DRA}(\mathcal{L}_S, \mathcal{L}_T) \quad (4.16)\]

in which \(\mathbf{DRA}(\cdot, \cdot)\) is our proposed hypercomplex network in \(G(4, 0, 0)\) .

4.3.3 3D Lines as bivectors in 1d-Up CGA

Poses as motors in \(G(4, 0, 0)\) . As in Section 4.2.4, most datasets represent poses as a rotation matrix and translation vector \((\mathbf{R}, \mathbf{t})\) , with \(\mathbf{R} \in \text{SO}(3)\) , \(\mathbf{t} \in \mathbb{R}^3\) . These can be readily converted into motors \(M\) .

0b8b3bfc048feb76864f1845ea8db92e_img.jpg

Diagram illustrating the effects of curvature λ in G(4,0,0) on pose estimation. The diagram shows a transition from 'Spherical Space - G(4,0,0)' on the left to 'Euclidean Space - G(3,0,0)' on the right, with a central 'curvature λ' label. The left side features a blue sphere with 'higher R uncertainty' and 'lower t uncertainty'. The right side features a yellow plane with 'lower R uncertainty' and 'higher t uncertainty'. A double-headed arrow connects the two spaces, indicating the effect of curvature λ.

Fig. 4.11 Effects of \(\lambda\) in \(G(4, 0, 0)\) on pose estimation.

To obtain a rotor \(R\) from \(\mathbf{R}\) , we first express the columns of \(\mathbf{R}\) as a linear combination of the basis vectors:

\[\begin{aligned} A' &= [\mathbf{R}_{00}e_1 + \mathbf{R}_{10}e_2 + \mathbf{R}_{20}e_3, \\ &\quad \mathbf{R}_{01}e_1 + \mathbf{R}_{11}e_2 + \mathbf{R}_{21}e_3, \\ &\quad \mathbf{R}_{02}e_1 + \mathbf{R}_{12}e_2 + \mathbf{R}_{22}e_3] \end{aligned} \quad (4.17)\]

Then, the (unnormalized) rotor can be evaluated as

\[R \propto 1 + \sum_{k=1}^3 A_k A'_k \quad (4.18)\]

where \(A = [e_1, e_2, e_3]\) . The final rotor is obtained via the normalization \(R = R/\sqrt{R\tilde{R}}\) . Note that \(R\) in 4D spherical space is identical to \(R\) in 3D Euclidean space. A translation rotor \(T\) , on the other hand, is obtained from the translation vector as \(\mathbf{t}\) as shown in Eq. 4.7.

The geometric product \(M = TR\) yields a motor, which represents a rotation and a translation in spherical space. Motors contain exclusively even blades, presenting 1 scalar, 6 bivector components ( \(e_{01}, e_{02}, e_{03}, e_{12}, e_{13}, e_{23}\) ) and 1 quadrivector component ( \(e_{0123}\) ). Since \(M\tilde{M} = 1\) , a motor is specified by 8 real coefficients and it has 6 degrees of freedom.

Lines as bivectors in \(G(4, 0, 0)\) . Recall from Section 4.2, that a point \(p \in G(3, 0, 0)\) is mapped to \(P \in G(4, 0, 0)\) via

\[P = \left( \frac{2\lambda}{\lambda^2 + p^2} \right) p + \left( \frac{\lambda^2 - p^2}{\lambda^2 + p^2} \right) e_0 \quad (4.19)\]

Conversely, \(P\) can be projected back onto 3D space via:

\[p = \frac{\lambda[(P \cdot e_1)e_1 + (P \cdot e_2)e_2 + (P \cdot e_3)e_3]}{1 + P \cdot e_0} \quad (4.20)\]

Given two points \(p, q\) lying on a line in 3D space, we can map them to \(P, Q\) via Eq. 4.19. A line \(\ell \in G(4, 0, 0)\) is then given by the wedge product of \(P, Q\) , i.e.

\[\ell = P \wedge Q \quad (4.21)\]

This yields a pure bivector of the form

\[\ell = \ell_{01}e_{01} + \ell_{02}e_{02} + \ell_{03}e_{03} + \ell_{12}e_{12} + \ell_{13}e_{13} + \ell_{23}e_{23} \quad (4.22)\]

for which, after normalisation \(\ell = \ell(\langle \ell \ell \rangle_0)^{-1}\) , it holds that \(\ell^2 = -1\) . Moreover, defining \(E = [\ell_{01}, \ell_{02}, \ell_{03}]\) and \(B = [\ell_{12}, \ell_{23}, -\ell_{13}]\) , it holds that \(E \cdot B = 0\) . Similarly to the Plücker coordinate representation, \(\ell\) has 4 degrees of freedom. Note that the mapping between lines in 4D space and their counterparts in 3D, as described in this section, is not unique. A line parametrisation in 4D can be obtained directly from \(v\) and \(m\) without the need for \(P\) and \(Q\) . However, since the results are largely identical, we have chosen to retain the current approach.

Choice of \(\lambda\) , the space curvature. The balancing of the rotation and translational component in motors \(M\) depends on the curvature \(\lambda\) , which is a free parameter in \(G(4, 0, 0)\) (see Fig. 4.11). A larger \(\lambda\) means a flatter space, while a smaller \(\lambda\) corresponds to a more curved space. How large a \(\lambda\) we take depends on the span of the dataset, as we shall see in the Experiments section.

e684118e43323b2c210c8dbc975db92a_img.jpg

Diagram of the Define, Refine, Align (DRA) pipeline. The pipeline starts with a source line bundle L_S and a target line bundle L_T. The DEFINE module (green) uses a Feature Extractor Module to produce embeddings in G(4,0,0) and a Reshape module to produce L_d,1 and L_d,2. The REFINER module (blue) consists of an Equivariant Module phi and a Rotational Module rho, both in G(4,0,0), which process L_d,1 and L_d,2 to produce M_phi,1, M_phi,2, M_rho,1, and M_rho,2. The ALIGN module (pink) concatenates and pools these to create a pose M-tilde, which is then used to estimate the rigid body motion l-tilde' = M-tilde' l-tilde.

Fig. 4.12 Define, Refine, Align (DRA): The DRA pipeline to estimate the pose \(M\) that aligns the source line bundle \(\mathcal{L}_S\) to the target line bundle \(\mathcal{L}_T\) via \(\mathcal{L}_T = M\mathcal{L}_S\tilde{M}\) . The Define module is an attention-based feature extractor to identify putative line matches in the two frames \(\mathcal{L}_{d,1}, \mathcal{L}_{d,2}\) . The Refine module consists of an Equivariant Module \(\phi\) and a Rotational Module \(\rho\) , that map lines to poses in \(G(4, 0, 0)\) . Lastly, the Align module averages poses \(M_{\rho,1}, M_{\rho,2}, M_{\phi,1}, M_{\phi,2}\) and normalizes the output to obtain \(\tilde{M}\) , aligning the two bundles.

4.3.4 Architecture: attentional, rotational and equivariant modules

The DRA architecture is outlined in Fig. 4.12. It consists of 3 steps, explained below. DRA works entirely in \(G(4, 0, 0)\) , meaning its inputs, outputs, weights and biases are multivectors in the 1d-Up CGA.

Define. The Define module extracts line correspondences \(\mathcal{L}_{d,1}\) and \(\mathcal{L}_{d,2}\) , from the source and target bundles \(\mathcal{L}_S\) and \(\mathcal{L}_T\) , respectively. The first part is a simplified feature extractor module derived from [116].

Unordered lines are first processed using MLP blocks inspired by PointNet [152] to extract line-wise features. Subspace coding is then applied to lines in 1d-UP CGA. Bivectors in \(G(4, 0, 0)\) can be decomposed into two sets of components: those containing the conformal basis vector \(e \triangleq e_0\) , i.e. \(\ell_0 = \{\ell_{01}, \ell_{02}, \ell_{03}\}\) and those that do not, i.e. \(\ell_\emptyset = \{\ell_{12}, \ell_{13}, \ell_{23}\}\) . This is similar to the direction ( \(v\) ) and moment ( \(m\) ) vectors in Plücker coordinates. Two separate networks process and \(\ell_0\) and \(\ell_\emptyset\) independently before concatenation. K-nearest neighbor (KNN) graphs are built in each subspace to capture local geometric structures.

Then, the 128-dimensional features from subspace coding are refined using self-attention and cross-attention mechanisms. A multiplex graph representation is used, with intra-frame edges (lines within the same reconstruction) and inter-frame edges (connecting lines across different reconstructions). DRA employs a single pair of self- and cross-attention blocks. Multi-head attention with four heads is used.

The last part is original: for each of the two outputs of shape \(B \times N \times 192\) we perform (i) max pooling, to reduce the dependency from the number of lines \(N\) present in the bundle; (ii) reshaping the features to \(B \times 32 \times 6\) , to obtain 32 lines specified by 6 bivector coefficients; (iii) embedding the coefficients into \(G(4, 0, 0)\) and (iv) tensor normalisation to guarantee they represent lines, i.e. that \(\ell^2 = -1\) . The Define module is summarised in Fig. 4.13.

Refine. The Refine module is composed of two components that operate in parallel: the Equivariant Module \(\phi\) and the Rotational Module \(\rho\) . Both modules take in input line bundles \(\mathcal{L}_{d,1}\) and \(\mathcal{L}_{d,2}\) .

The Equivariant Module, shown in Fig. 4.14, is built from the Clifford Group Equivariant Layers introduced in [158] and in Chapter 3, Section 3.9. Its input is a bundle of lines, i.e. pure bivectors, while its output is a motor, i.e. an even-grade

4c63a0e17b54c7e61d512c276932114c_img.jpg

Diagram of the Define block. It shows two input line bundles, source (L_S) and target (L_T), each with N lines. Each line is processed through two components: l_0 (local conv + MLP) and l_phi (local conv + MLP). The outputs are then processed by a self-attention module and a cross-attention module. The final output is a reshaped and normalized representation into two line bundles, L_d,1 and L_d,2, containing fewer, matching lines.

Fig. 4.13 The Define block, adapted from the Feature Extractor of [116]. The \(\ell_0\) and \(\ell_\phi\) components of lines \(\ell\) in \(\mathcal{L}_S, \mathcal{L}_T\) are processed separately through cascaded MLPs. Self- and cross-attention module are used to extract features from a graph representation of the line bundles. Such features are reshaped and normalised into line bundles \(\mathcal{L}_{d,1}, \mathcal{L}_{d,2}\) , containing fewer, matching lines.

90321ffa3cade82e4a57d6250d046c88_img.jpg

Diagram of the Equivariant Module phi. It shows a sequence of four Equivariant Blocks with channel counts C=32, C=64, C=10, and C=1. Each block contains a Multivector Linear Layer, Multivector SiLU, Sherrillier Geometric Product Layer, and Multivector Layer Norm. The input is a line bundle L_d and the output is a pose M_phi.

Fig. 4.14 The Equivariant Module \(\phi\) in the Refine block. It consists of a cascade of MV linear and geometric product layers, yielding poses \(M_\phi\) invariant to transformations \(\Gamma(\cdot)\) .

multivector. In formulae:

\[M_\phi = \phi(\mathcal{L}_d) \quad (4.23)\]

in which \(\phi(\cdot)\) is an equivariant transformation, meaning that for any transformation \(\Gamma(\cdot) \in G(4, 0, 0)\) ,

\[\Gamma(M_\phi) = \Gamma(\phi(\mathcal{L}_d)) = \phi(\Gamma(\mathcal{L}_d)) \quad (4.24)\]

i.e. a same set of lines \(\mathcal{L}_d\) will be transformed into the same pose \(M_\phi\) via \(\phi\) , regardless of the transformation \(\Gamma\) applied to it. This makes \(M_\phi\) , the pose extracted from \(\mathcal{L}_d\) , robust to the bundle's position or orientation in space and the coordinate system in which it is expressed.

The Equivariant Module is composed of four sub-blocks, each with a different number of channels, and each containing the four same layers: (i) Multivector (MV) linear layer; (ii) MV SiLU; (iii) geometric product layer and (iv) MV layer norm.

Given a set of multivectors \(\{x_i\}_{i=1}^C\) , with \(C\) input channels, the output \(z_j\) of the \(j\) -th channel of a MV layer is given by:

\[\langle z_j \rangle_k = \sum_{i=1}^C \varphi_{ijk} \langle x_i \rangle_k, \quad (4.25)\]

426a0ab9454f31706038a0ac0bc37b9c_img.jpg

The diagram illustrates the Rotational Module \(\rho\) . It starts with an input \(\mathcal{L}_d\) (represented by green lines) entering a series of four sandwich product layers. The first layer has \(C=32\) , the second has \(C=64\) , the third has \(C=10\) , and the fourth has \(C=1\) . The output of the final layer is a motor \(M_\rho\) (represented by a red arrow). Above the output, it indicates 'weights \(M_w\) , biases \(M_b\) '.

Diagram of the Rotational Module ρ. It shows a sequence of four sandwich product layers with channel counts C=32, C=64, C=10, and C=1. The input is a set of lines L_d, and the output is a motor M_ρ. Above the output, it says 'weights M_w, biases M_b'.

Fig. 4.15 The Equivariant Module \(\rho\) in the Refine block. Lines are transformed into proposals of poses, which are rotated, translated and downsampled into \(M_\rho\) .

in which \(\langle \cdot \rangle_k\) is the \(k\) -grade extractor operator and \(\varphi_{ijk} \in \mathbb{R}\) is a learnable weight.

Geometric product layers, on the other hand, capture pairwise interactions, ensuring that lines within the same bundle interact to yield a final pose estimate. Given a learnable linear combination of the inputs

\[y_i = \sum_{p=1}^C \beta_{pi} x_p, \quad (4.26)\]

the output of the \(j\) -th channel, \(z_j\) , obeys

\[\langle z_j \rangle_k = \sum_{i=1}^C \sum_{p=1}^C \varphi_{ijk} \langle x_i (\beta_{pi} x_p) \rangle_k. \quad (4.27)\]

in which \(\varphi_{ijk} \in \mathbb{R}\) and \(\beta_{pi} \in \mathbb{R}\) are learnable weights. Activations and normalisations are applied grade-wise.

The Rotational Module \(\rho\) , shown in Fig. 4.15, is derived from the Geometric Clifford Algebra Networks of [159] and works in the same way as the layers in [142]. Like the Equivariant Module, it takes a collection of bivectors for inputs, while its output is a single motor, i.e.:

\[M_\rho = \rho(\mathcal{L}_d) \quad (4.28)\]

The Rotational Module is a cascade of four sandwich product layers, for which the input-output relationship is analogous to that in Eq. 4.11, and given by:

\[z_j = \sum_{i=1}^C \left( M_{w;ij} x_i \tilde{M}_{w;ij} + M_{b;ij} \right) \quad (4.29)\]

in which \(M_w\) and \(M_b\) are even-grade weights and biases, guaranteeing that each output is motor-like, i.e. it contains only even grade terms: after the first layer, the 32 lines in \(\mathcal{L}_d\) are transformed into 64 pose proposals, which are then rotated, translated and

downsampled until convergence to \(M_\rho\) .

Align. The Align module creates the motor \(\hat{M}\) that aligns \(\mathcal{L}_S\) to \(\mathcal{L}_T\) . The four motors in the output of the Refine module, \(M_{\phi,1}, M_{\phi,2}, M_{\rho,1}, M_{\rho,2}\) are then concatenated and pooled to obtain an average motor \(M_\mu\) :

\[M_\mu = \text{AvgPool}(M_{\phi,1}, M_{\phi,2}, M_{\rho,1}, M_{\rho,2}) \quad (4.30)\]

Eventually, the motor is normalised as

\[\hat{M} = \frac{M_\mu}{\sqrt{\langle M_\mu \tilde{M}_\mu \rangle_0}} \quad (4.31)\]

to guarantee it represents a pose, obtaining the estimated motor \(\hat{M}\) .

DRA is extremely lightweight - it computes the relative pose between two line bundles in few hundredths of a second, and it has slightly above 1 million trainable parameters, as opposed to the current correspondence-guided state-of-the-art architecture, MEGNet, which has \(4\times\) as many (see Table 4.4).

Table 4.4 Number of Parameters for Different Methods

Method PlückerNet PlückerNet Regr. MEGNet MEGNet-mini DRA (ours)
# Parameters (M) 2.229 1.598 5.237 1.571 1.384

4.3.5 Datasets: Structured3D and Semantic3D

We follow the approach of [116] and use two datasets for our experiments: Structured3D and Semantic3D.

Structured3D [200] consists of 3D indoor house annotations, including junctions and lines. It contains 3,500 house scenes, with an average of 306 lines per scene (median: 312). The typical house size is approximately \(11\text{m} \times 10\text{m} \times 3\text{m}\) . Since this dataset focuses on indoor structures, most lines are either parallel or perpendicular. The dataset is split into 2,975 training and 525 testing scenes.

Semantic3D [60] includes large-scale, densely scanned 3D point clouds of urban environments. We use the semantic-8 subset, which has 30 scans totaling over a billion points. A fast 3D line detection method extracts line segments, which are then grouped into geographical cells of \(10\text{m} \times 10\text{m}\) in the X-Y plane. Scenes with fewer than 20

lines are removed, resulting in 1,981 total scenes (average: 676 lines, median: 118). The dataset is split into 1,683 training and 298 testing scenes.

Random rigid transformations are applied to 3D lines in each scene, with rotations ranging from \(0^\circ\) to \(45^\circ\) and translations from \(-2\text{m}\) to \(2\text{m}\) . Gaussian noise is added to both the source and target line sets, perturbing footprint points by up to \(\pm 0.25\text{m}\) and slightly rotating directions by up to \(\pm 5^\circ\) . To simulate partial scans, 70% of the lines in both sets are randomly retained, resulting in an overlap of approximately 70%.

4.3.6 Error metrics

The error metrics are consistent with [116, 80]. We decompose the estimated motor \(\hat{M}\) into its rotation and translation components, the rotors \(\hat{R}\) and \(\hat{T}\) , and then convert them into the corresponding rotation matrix \(\hat{\mathbf{R}}\) and translation vector \(\hat{\mathbf{t}}\) , respectively. The rotation error is given by:

\[\epsilon_\rho = \arccos \left( \frac{\text{tr}(\mathbf{R}^T \hat{\mathbf{R}}) - 1}{2} \right) \quad (4.32)\]

while the translation error is measured as the \(L2\) distance between ground truth and estimated translation vectors, i.e.

\[\epsilon_t = \|\mathbf{t} - \hat{\mathbf{t}}\|_2^2 \quad (4.33)\]

f9898343c953997a111533ec24a9f1a1_img.jpg

Figure 4.16 displays two rows of 3D line alignment results. Each row contains four subplots labeled (a) through (d).
Row 1 (top):
(a) shows source line bundle \(\mathcal{L}_S\) (red) and target line bundle \(\mathcal{L}_T\) (black).
(b) shows \(\mathcal{L}_S\) rotated and translated with ground truth motor \(M\) (blue) and estimated motor \(\hat{M}\) in high curvature space ( \(\lambda = 1\) ). A green arrow indicates "larger R uncertainty".
(c) shows \(\mathcal{L}_S\) rotated and translated with estimated motor \(\hat{M}\) in low curvature space ( \(\lambda = 50\) ). A yellow arrow indicates "larger t uncertainty".
(d) shows \(\mathcal{L}_S\) rotated and translated with estimated motor \(\hat{M}\) using combined curvatures (red).
Row 2 (bottom):
(a) shows source line bundle \(\mathcal{L}_S\) (red) and target line bundle \(\mathcal{L}_T\) (black).
(b) shows \(\mathcal{L}_S\) rotated and translated with ground truth motor \(M\) (blue) and estimated motor \(\hat{M}\) in high curvature space ( \(\lambda = 1\) ). A green arrow indicates "larger R uncertainty".
(c) shows \(\mathcal{L}_S\) rotated and translated with estimated motor \(\hat{M}\) in low curvature space ( \(\lambda = 50\) ). A yellow arrow indicates "larger t uncertainty".
(d) shows \(\mathcal{L}_S\) rotated and translated with estimated motor \(\hat{M}\) using combined curvatures (red).

Figure 4.16: Two alignment examples (top and bottom) for the test set of Structured3D. Each row shows four subplots: (a) source and target line bundles L_S, L_T; (b) L_S rotated and translated with ground truth motor M (blue) and estimated motor M_hat in high curvature space (small lambda); (c) L_S rotated and translated with estimated motor M_hat in low curvature space (large lambda); (d) L_S rotated and translated with estimated motor M_hat using combined curvatures (red).

Fig. 4.16 Two alignment examples (top and bottom) for the test set of Structured3D. (a) The source and target line bundles \(\mathcal{L}_S, \mathcal{L}_T\) ; (b)-(c)-(d) \(\mathcal{L}_S\) rotated and translated with ground truth motor \(M\) (blue) and with estimated motor \(\hat{M}\) in (b) high curvature space (small \(\lambda\) ); (c) low curvature space (large \(\lambda\) ); (d) combined curvatures (red).

Table 4.5 Results for the correspondence-free and correspondence-based line registration problem.

Category Method Structured3D Semantic3D
Rotation ( \(\phi\) ) Translation (m) Rotation ( \(\phi\) ) Translation (m)
Q1 Q2 Q3 Mean Q1 Q2 Q3 Mean Q1 Q2 Q3 Mean Q1 Q2 Q3 Mean
Corresp.-free PflükenNet Regr. [116] 2.436 5.610 4.935 5.313 0.151 0.240 0.367 0.402 15.90 20.93 24.95 26.30 13.47 1.571 2.281 2.569
GA-PflükenNet Regr. (ours) 0.302 1.375 1.812 1.589 0.038 0.060 0.098 0.087 7.652 12.71 17.48 14.35 0.148 0.232 0.380 0.285
DRA (ours) 0.886 1.282 1.784 1.526 0.036 0.058 0.094 0.083 7.481 12.58 17.35 14.12 0.144 0.227 0.375 0.278
ICL [7] 0.353 0.520 0.795 0.544 0.090 0.044 0.078 0.041 0.803 2.050 5.205 18.40 0.085 0.226 0.866 1.193
RORNet [189] 0.030 0.063 0.146 0.329 0.014 0.020 0.027 0.023 0.396 1.113 1.946 6.342 0.369 0.524 0.711 0.575
PflükenNet [116] 0.030 0.062 0.111 0.249 0.013 0.018 0.025 0.021 0.513 0.952 1.092 5.574 0.314 0.451 0.616 0.489
Corresp.-based MEG-Net mini [170] 0.033 0.064 0.121 0.294 0.012 0.017 0.025 0.020 0.533 0.942 1.637 5.879 0.315 0.453 0.614 0.489
MEG-Net [170] 0.030 0.058 0.104 0.200 0.012 0.017 0.024 0.019 0.510 0.918 1.516 4.220 0.302 0.448 0.611 0.483

Table 4.6 Relative performance of DRA compared to other methods. Positive values (in dark green) indicate a promotion (i.e. DRA’s error is lower), while negative values (in red) indicate a demotion (i.e. DRA’s error is higher).

Category Method Structured3D Semantic3D
Rotation (°) Translation (m) Rotation (°) Translation (m)
Q1 Q2 Q3 Mean Q1 Q2 Q3 Mean Q1 Q2 Q3 Mean
Corresp.-free PfluegerNet Regr. [116] -63.6% -64.4% -63.9% -71.2% -76.2% -75.8% -74.4% -79.3% -53.0% -39.9% -30.4% -46.3% -89.3% -87.8% +83.5% -80.1%
GA-PfluegerNet Regr. +1.8% -6.8% +1.6% +4.0% +5.3% -3.3% +4.1% +4.6% +3.2% +1.0% -0.7% +1.6% -2.7% +2.2% -1.2% -2.3%
Corresp.-based ICL [7] 40.2% -50.4% -55.4% +57.4% +16.7% -24.1% -17.0% -70.6% -30.3% -83.7% -70.6% -41.6% -0.4% +120.7% +329.1%
ROBINet [180] 90.6% -95.2% -91.8% -78.4% -61.1% -65.5% -71.3% -72.3% -92.1% -91.1% -88.8% -55.1% +156.3% +130.8% +80.6% +106.8%
PfluegerNet [116] 90.6% -95.2% -90.9% -83.7% -45.9% -60.0% -73.4% -74.7% -92.2% -92.4% -91.1% -40.5% +118.1% +98.7% +64.3% +70.0%
MEG-Net mini [80] 96.3% -95.0% -95.2% -80.8% -66.7% -70.7% -73.4% -75.9% -92.5% -92.5% -58.4% +118.8% +99.6% +63.7% +76.0%
MEG-Net [80] 90.6% -95.3% -91.2% -86.9% -66.7% -70.7% -71.3% -77.1% -92.2% -92.7% -91.3% -70.1% +109.7% +97.1% +62.9% +73.7%

4.3.7 Implementation details

Define, Refine, Align has been implemented in Pytorch and is trained using the Adam optimizer with a learning rate of \(\eta = 10^{-4}\) , batch size of \(B = 8\) for 24 hours on a single GPU NVIDIA GeForce RTX 4090. The number of nearest neighbors for each line in the Define block is set to 10.

Since our pipeline estimates poses directly, there is no need to tune parameters of offline algorithms such as the Sinkhorn and RANSAC algorithms as in [116, 80]. The equivariant Clifford layers have been derived from [158], while sandwich product layers have been implemented via Torch-GA [136, 82]. GA operations that did not require gradients, like mapping the datasets to \(G(4, 0, 0)\) and visualizing results, have been handled through Clifford [61].

Loss function. The objective function to be minimised is simply the mean squared error (MSE) between ground truth motors \(M\) and estimated motor \(\hat{M}\) . For a batch with \(n\) elements, it is defined as:

\[\mathcal{L} = \frac{1}{n} \sum_{i=1}^n \sum_{j=0,2,4} \left( \langle M^{(i)} \rangle_j - \langle \hat{M}^{(i)} \rangle_j \right)^2 \quad (4.34)\]

This guarantees that the estimated motor coefficients are numerically close to the ground truth motor coefficients. We have also tested the loss \(\mathcal{L}_\rho = \frac{1}{n} \sum_{i=1}^n (1 - \langle M^{(i)} \hat{M}^{(i)} \rangle_0)\) , which guarantees that \(\hat{M}\) corresponds to the same rigid body motion \(M\) , since \(M \hat{M} = 1\) , but found it to be equivalent to Eq. 4.34.

4.3.8 Results

Results are summarised in Table 4.5. We report rotation and translation errors for the Structured3D and Semantic3D datasets, including three quartiles (Q1, Q2, Q3) and the mean error, following the standard of previous literature [116]. We compare DRA to correspondence-free and, for the sake of completeness, to correspondence-based methods, distinguishing between models that estimate the pose directly and those that first compute the correspondence matrix \(\mathbf{C}\) .

The correspondence-free baseline is represented by the Regression network introduced in [116] (PlückerNet Regression), composed of a Feature Extractor block and linear layers to directly estimate \((\mathbf{R}, \mathbf{t})\) . We also test the same architecture to take in input line bundles represented as bivectors in \(G(4, 0, 0)\) and to estimate motors

d2fc2f1879ae64e0fd49e163908bc95c_img.jpg

Figure 4.17: Two alignment examples (top and bottom) for the test set of Semantic3D. Each row shows four subplots (a-d). (a) shows source (red) and target (black) line bundles L_S and L_T. (b) shows the bundles after rotation and translation with ground truth motor M (blue) and estimated motor M_hat in high curvature space (small lambda). (c) shows the bundles after rotation and translation with ground truth motor M (blue) and estimated motor M_hat in low curvature space (large lambda). (d) shows the bundles after rotation and translation with ground truth motor M (blue) and estimated motor M_hat in combined curvatures (red).

Fig. 4.17 Two alignment examples (top and bottom) for the test set of Semantic3D. (a) The source and target line bundles \(\mathcal{L}_S, \mathcal{L}_T\) ; (b)-(c)-(d) \(\mathcal{L}_S\) rotated and translated with ground truth motor \(M\) (blue) and with estimated motor \(\hat{M}\) in (b) high curvature space (small \(\lambda\) ); (c) low curvature space (large \(\lambda\) ); (d) combined curvatures (red).

(i.e. a real-valued network operating on quantities in spherical space), reported as GA-PlückerNet Regression, in Table 4.5. Correspondence-based methods include the ICL algorithm [7], PlückerNet [116], RORNet [189], MEG-Net and MEG-Net mini [80].

Results for the DRA pipeline have been obtained by training the pipeline twice, for large and small values of \(\lambda\) (see Fig. 4.11), and combining the rotation component learnt in a low curvature space with the translation component learnt in a high curvature space: examples of the DRA alignment procedure and of the impact that \(\lambda\) has on the estimated pose are shown in Figs. 4.16-4.17 for the Structured3D and Semantic3D datasets, respectively. Note that the high \(\lambda\) case for the Semantic3D dataset required a larger number of channels in the Define module, namely \(C = \{32, 128, 64, 1\}\) . The relative performance of the DRA pipeline compared to other methods is reported in Table 4.6.

DRA emerges to be the best performing correspondence-free pipeline, reducing the mean rotation error by 71.2% and 46.3% compared to PlückerNet Regression for the Structured3D and Semantic3D datasets, respectively. Similarly, the translation error is reduced by 79.3% and 89.1% for the two datasets. Surprisingly, a simple real-valued neural network like GA-PlückerNet Regression achieves comparable performances to the DRA pipeline in \(G(4, 0, 0)\) , suggesting that choosing a spherical space representation might be already enough to build a correspondence-free pipeline robust to partially overlapping sets.

Naturally, methods that explicitly downsample the line bundles and perform registration solely on matching lines outperform DRA. Such methods, however, as already mentioned in the Related Work section, require an iterative optimisation, they are often sensitive to the algorithm initialisation and not suitable for real-time operation. ICL, for example, fails in the presence of noise or outliers, as shown by the large mean errors reported, over which DRA shows up to a \(\times 5\) improvement.

From Table 4.6 it is also clear how a curved space can improve the quality of translation estimation even over correspondence-based methods, as opposed to working in 3D space, when dealing with a realistic, outdoor dataset for which the translation vector \(\mathbf{t}\) can be very large.

4.4 Conclusion

In this chapter, we studied the problem of pose estimation in 3D space and introduced two geometry-aware deep learning approaches: CGAPoseNet+GCAN for camera pose estimation and Define, Refine, Align (DRA) for 3D line registration.

In Section 4.2, we proposed CGAPoseNet+GCAN, an extension of CGAPoseNet that incorporates a Geometric Clifford Algebra Network (GCAN) to improve camera pose predictions. Instead of directly regressing pose parameters, the backbone generates proposals, which are refined through GCAN layers that apply geometric transformations in Clifford Geometric Algebra. These sandwich product layers preserve object grades and provide interpretable outputs, ensuring that predictions remain in the correct mathematical space. By leveraging this geometry-aware approach, CGAPoseNet+GCAN improves accuracy over both CGAPoseNet and PoseNet, reducing the pose regression error while requiring no additional scene information. It achieves state-of-the-art results while reducing the number of trainable parameters by 17% and maintaining the same computational cost. This demonstrates the potential of GCANs in simplifying and enhancing deep learning approaches for computer vision tasks where geometric information is essential.

In Section 4.3, we introduced DRA, a hypercomplex neural network pipeline that operates in 1D-Up Conformal Geometric Algebra (CGA) to align partially overlapping line bundles across different frames without explicitly computing correspondences. By representing inputs, outputs, and trainable parameters in spherical space, DRA introduces an inductive bias that enables precise alignment without prior correspondence estimation. This makes it particularly suitable for real-time applications. Compared to

the best correspondence-free architecture, PlückerNet Regression, DRA reduces pose estimation error by at least 30% and up to 90%.

Chapter 5

Partial Differential Equations (PDEs)

As free as the wind,
hopefully learning
why the sea on the tide
has no way of turning.


— “More Than This”, Roxy Music
(1982)

5.1 Introduction

In this chapter, we explore the application of GA networks to solve partial differential equations (PDEs). The renewed interest in this type of networks arose precisely due to their potential in PDE modelling [17, 157, 158]. We build on recent progresses in the field of GA networks applied to PDEs to better understand their working principles and better characterise them. We will focus on the Navier-Stokes equations, to model fluid dynamics in 2D and 3D, and Maxwell’s equations in 2 and 3D.

Each Section is centered around a theme: Section 5.3 explores the challenges deriving from applying non-linearities on multivectors. We do so by designing the GA-ReLU activation function, tailored for multivector-valued networks, which enhances performance in 2D Navier-Stokes problems while highlighting the significance of nonlinearities that respect the structure of multivectors.

Section 5.4 is focused on geometry-to-physics mapping and architecture design: moving to 3D, we introduce Fengbo, a powerful neural operator pipeline that efficiently tackles large-scale PDEs over complex geometries using 3D Clifford Algebra, achieving superior accuracy with fewer parameters compared to existing models.

Lastly, in Section 5.5 we discuss the importance of the choice of the right algebra in which to embed data: we present STAResNet, a ResNet-inspired architecture that operates within the Spacetime Algebra framework, demonstrating its ability to outperform vanilla Clifford ResNet architectures in solving Maxwell's equations.

For better readability, we split this Chapter into two parts, focusing on different PDEs: Part I focuses on Computational Fluid Dynamics and the solution of Navier-Stokes equations, while Part II focuses on Electromagnetism.

5.2 Background

Many natural phenomena and complex systems, including electromagnetism, quantum mechanics and seismic waves can all be described by partial differential equations (PDEs). Being able to solve such PDEs means being able to predict the evolution of the system's state over time, which is highly desirable in a variety of applications, such as estimating stock prices or weather forecasting. While PDEs provide an exact mathematical representation of these systems, they are often too complex to solve analytically. As a result, numerical methods are employed to model and simulate these systems.

Finite element analysis (FEA) and finite difference methods (FDM) are some of the well established computational techniques to resolve complex geometries and boundary conditions associated with PDEs, producing highly accurate solutions [148, 115, 52]. However, this accuracy often comes at the cost of significant computational resources, especially when high resolution is required to capture finer details.

In the past decade, machine learning (ML) has emerged as a powerful tool for solving PDEs [26, 87, 186]. ML-based methods can be several orders of magnitude faster than traditional numerical approaches, allowing for rapid simulations and predictions: the use of ML can dramatically reduce computational time while maintaining a reasonable level of accuracy. This is particularly useful in applications where real-time or near-real-time predictions are needed, such as in weather forecasting or fluid dynamics simulations. Most ML approaches try to blend physical laws and a large amount of data to efficiently approximate PDE solutions while significantly cutting down on computational costs.

Learning methods in PDE modelling. A key challenge in applying ML to PDEs is ensuring that the model does not simply perform pattern recognition. Instead, the objective is for the model to capture the underlying physical principles governing the

PDEs and accurately represent the geometry of the domain in which these equations are defined. Consequently, most models designed for PDEs are designed to address these requirements.

Physics-Informed Neural Networks (PINNs) [153, 154, 37], for example, do so by incorporate the governing PDEs into the neural network training process and learn directly from them. This integration helps ensure that the solutions respect physical constraints and produce realistic outcomes, addressing the limitation of simpler ML models that might fail to generalise on unseen data. However, each PINN is tailored to a specific PDE and often require an additional Graph Network for spatial generalisation.

A similar philosophy is found in solver-in-the-loop methods [174, 16, 114]. These hybrid methods combine an ML architecture with a relatively simple numerical solver. The numerical solver helps to refine the predictions of the ML model, ensuring that the solutions remain grounded in the physical constraints of the problem. Deep Galerkin Method (DGM) algorithms [166, 108, 6] also fall in the same category. DGM algorithms are trained to satisfy the differential operator, initial conditions, and boundary conditions, proving to be particularly suitable for dealing with high-dimensional PDEs.

Recently, neural operators emerged as a key architecture to tackle the problem of PDE modelling [113, 111, 122, 155, 8]. Neural operators differ from neural networks since they learn mappings between function spaces, or domains, instead of being function approximators like neural networks. When tackling PDEs, neural operators learn a mapping from input functions, which represent the initial or boundary conditions, to output functions, which represent the solution to the PDEs. They come in several versions: Fourier Neural Operators, for example, operate in the frequency domain, where convolutions are more efficient at capturing long-range dependencies and periodic patterns in the data [113, 110, 112], but Convolutional, Laplace and Graph neural operators have also been reported in the literature to address specific problem requirements. GINO [112], for example, is a pipeline combining a Graph neural operator, that handles irregular shapes and maps them onto a regular grid in latent space, and a Fourier neural operator, that processes the transformed input in latent space, that achieves state-of-the-art performance on large scale 3D PDEs.

Part I: Computational Fluid Dynamics

5.3 GA-ReLU: an activation function for Navier-Stokes PDEs

In this Section we focus on the issue of non-linearities in GA networks, since applying them to multivectors is not a trivial task: they are generally applied in a point-wise fashion over each real-valued component of a multivector. This approach discards interactions between different elements of the multivector input and compromises the geometric nature of GA networks. To bridge this gap, we propose GA-ReLU, a GA approach to the rectified linear unit (ReLU), and show how it can improve the solution of Navier-Stokes PDEs.

5.3.1 Problem definition

We wish to define a non-linear function \(\psi(\mathbf{x}) : \mathcal{G}_n \rightarrow \mathcal{G}_n\) for networks operating in GA in order to extend the geometric flavour of the approach also to the activation function. In particular, we refer to the Navier-Stokes PDE problem as formulated in [17] and train two networks, namely the Clifford ResNet and Clifford Fourier Neural Operator (FNO) with ReLU and GA-ReLU activation functions, respectively.

The incompressible Navier-Stokes equations in 2D are given by:

\[\frac{\partial \mathbf{u}}{\partial t} + (\mathbf{u} \cdot \nabla) \mathbf{u} = -\frac{1}{\rho} \nabla p + \nu \nabla^2 \mathbf{u} + \mathbf{f}, \quad \nabla \cdot \mathbf{u} = 0, \quad (5.1)\]

in which \(\frac{\partial \mathbf{u}}{\partial t}\) is the time derivative of the fluid velocity vector \(\mathbf{u} = (u_x, u_y)\) , \((\mathbf{u} \cdot \nabla) \mathbf{u}\) is the convective term, \(-\frac{1}{\rho} \nabla p\) is the pressure gradient, \(\nu \nabla^2 \mathbf{u}\) is the viscous diffusion and \(\mathbf{f}\) is the external force term. The incompressibility of the fluid is ensured by \(\nabla \cdot \mathbf{u} = 0\) .

Since there exists a coupling between vector quantities (the velocity \(\mathbf{u}\) ) and scalar ones (the pressure field, or the density of particles advected by \(\mathbf{u}\) ), it makes sense to express this coupling by “wrapping” together the vector and scalar information as a single multivector of the type:

\[\mathbf{x} = s + u_{e_1} e_1 + u_{e_2} e_2, \quad (5.2)\]

where \(s\) is the scalar pressure field and \(u_{e_1}\) and \(u_{e_2}\) are the components of \(\mathbf{u}\) along the \(e_1\) and \(e_2\) direction, respectively. Given two multivectors \(\{\mathbf{x}_{t_i}, \mathbf{x}_{t_{i+1}}\}\) at two different time instants \(t_i, t_{i+1}\) , we wish to use a ML pipeline that estimates \(\mathbf{x}_{t_{i+2}}\) . This multivector approach has been demonstrated to be more successful than estimating \((s, u_x, u_y)_{t+2}\) independently through a network not in GA, since the coupling between different

69a3fe7168aeda1c325001cba19f559a_img.jpg

Figure 5.1: Three 3D surface plots (a), (b), and (c) showing the vector part of different ReLU functions. Plot (a) shows a coefficient-wise ReLU function with a sharp, piecewise linear surface. Plot (b) shows a phase-dependent ReLU function with a smoother, more continuous surface. Plot (c) shows a GA-ReLU function with a surface that is smoother than (a) but has more complex, curved features. All plots are set against a grid on the e1-e2 plane.

Fig. 5.1 Vector part of the (a) coefficient-wise ReLU \(\phi(\mathbf{x})\) (b) phase-dependent ReLU \(f(\mathbf{x})\) and (c) GA-ReLU \(\psi(\mathbf{x})\) . The scalar part, not reported, is identical for all three functions and equal to \(\phi(s)\) .

geometric quantities has to be inferred by the network rather being explicitly expressed through a multivector structure [17]. We wish to keep the same coupling also when applying non-linearities.

We want to design \(\psi(\mathbf{x})\) in such a way that (i) it preserves the behaviour of the equivalent activation function defined over \(\mathbb{R}\) , and (ii) it is able to differentiate between grades and model interactions between components of the same grade.

An activation function \(\phi\) (that we will assume from now on to be the ReLU function) has commonly been applied to a multivector \(\mathbf{x}\) element-wise, i.e.:

\[\phi(\mathbf{x}) = \phi(s) + \phi(u_{e_1})e_1 + \phi(u_{e_2})e_2 \quad (5.3)\]

This approach satisfies (i), but not (ii) (see Fig. 5.1a). To fill in this gap, we introduce GA-ReLU. GA-ReLU is the composition of (1) a coefficient-wise ReLU, that depends on the magnitude of each multivector coefficient, and (2) a phase-dependent ReLU, that grasps the interaction between vector coefficients and depends on their phase difference. We will derive GA-ReLU below.

We look at the complex domain and express our 2D multivector in terms of complex numbers. The phase-dependent ReLU has been inspired from the complex cardioid activation function firstly introduced in [180], defined as

\[f(z) = \frac{1}{2} (1 + \cos(\arg(z))) z = K(\arg(z))z, \quad (5.4)\]

cdf78577d07f3375edf8d981fc288d6b_img.jpg

Loss Type Training Data Clifford ResNet + ReLU Clifford ResNet + GA-ReLU
One-Step Loss 15600 ~1.00E-02 ~1.00E-02
10400 ~1.20E-02 ~1.10E-02
5200 ~1.80E-02 ~1.70E-02
2080 ~3.10E-02 ~3.00E-02
Scalar Loss 15600 ~1.50E-02 ~1.50E-02
10400 ~1.70E-02 ~1.60E-02
5200 ~2.30E-02 ~2.20E-02
2080 ~3.80E-02 ~3.70E-02
Vector Loss 15600 ~9.00E-03 ~9.00E-03
10400 ~1.00E-02 ~9.50E-03
5200 ~1.50E-02 ~1.40E-02
2080 ~2.40E-02 ~2.30E-02

Figure 5.2: MSE error with error bars versus number of training data for Clifford ResNet with ReLU and GA-ReLU activation functions. The figure consists of three bar charts: One-Step Loss, Scalar Loss, and Vector Loss. Each chart compares Clifford ResNet + ReLU (red bars) and Clifford ResNet + GA-ReLU (blue bars) for training data sizes of 15600, 10400, 5200, and 2080. In all cases, GA-ReLU shows lower MSE error than ReLU.

Fig. 5.2 MSE error with error bars versus number of training data for Clifford ResNet with ReLU and GA-ReLU activation functions.

in which \(z = a + bi \in \mathbb{C}\) , \(i^2 = -1\) and \(K\) an attenuation function dependent on the argument of \(z\) . The complex cardioid is an extension of the ReLU function over \(\mathbb{C}\) and it is dependent only on the phase of the input rather than on its magnitude.

It is known that the 2D Geometric Algebra \(\mathcal{G}_{2,0}\) is isomorphic to \(\mathbb{C}\) by simply taking \(I \triangleq e_{12}\) as our imaginary unit, since \(e_{12}^2 = (e_1 e_2)^2 = e_1 e_2 e_1 e_2 = -e_1 e_1 e_2 e_2 = -1 = i^2\) . By keeping this in mind, it is easy to see that a generic multivector \(\mathbf{x} = x_0 + x_1 e_1 + x_2 e_2 + x_{12} e_{12} \in \mathcal{G}_{2,0}\) can be decomposed into “a sum of two complex numbers” as follows:

\[\mathbf{x} = (x_0 + Ix_{12}) + e_1(x_1 + Ix_2) = z_S + e_1 z_V. \quad (5.5)\]

Following [17], we will refer to \(z_S\) as the spinor part and to \(z_V\) as the vector part. We can then evaluate Eq.5.4 on \(\mathbf{x}\) as follows:

\[f(\mathbf{x}) = f(z_S) + e_1 f(z_V). \quad (5.6)\]

In 2D Navier-Stokes there is no bivector component, so Eq. 5.6 reduces to:

\[f(\mathbf{x}) = K(\arg(z_S))z_S + e_1 K(\arg(z_V))z_V = \phi(x_0) + e_1 K(\arg(z_V))z_V \quad (5.7)\]

f3be2e227ba6d9bde0a7be59d0551f7f_img.jpg

Loss Type Training Data FNO + ReLU (MSE) CFNO + GA-ReLU (MSE)
One-Step Loss 15600 ~1.38E-02 ~1.38E-02
10400 ~1.65E-02 ~1.55E-02
5200 ~1.95E-02 ~1.90E-02
2080 ~2.85E-02 ~2.80E-02
Scalar Loss 15600 ~1.55E-02 ~1.50E-02
10400 ~1.75E-02 ~1.65E-02
5200 ~2.80E-02 ~2.75E-02
2080 ~3.95E-02 ~3.90E-02
Vector Loss 15600 ~1.05E-02 ~1.05E-02
10400 ~1.25E-02 ~1.20E-02
5200 ~1.50E-02 ~1.45E-02
2080 ~2.25E-02 ~2.20E-02

Figure 5.3: Three bar charts showing MSE error versus number of training data for Clifford FNO with ReLU and GA-ReLU activation functions. The charts are titled 'One-Step Loss', 'Scalar Loss', and 'Vector Loss'. The x-axis for all charts shows training data counts: 15600, 10400, 5200, and 2080. The y-axis represents MSE error. Two series are compared: FNO + ReLU (red bars) and CFNO + GA-ReLU (blue bars). In all cases, CFNO + GA-ReLU achieves lower MSE error than FNO + ReLU. Error bars are shown for each data point.

Fig. 5.3 MSE error with error bars versus number of training data for Clifford FNO with ReLU and GA-ReLU activation functions.

since \(K(\arg(z_S)) = K(\arg(0)) = 1\) for \(x_0 > 0\) and \(K(\arg(0)) = 0\) for \(x_0 \leq 0\) . On the other hand, the second term can be computed to be:

\[e_1 K(\arg(z_V)) z_V = K(\tan^{-1}(\frac{x_2}{x_1})) x_1 e_1 + K(\tan^{-1}(\frac{x_2}{x_1})) x_2 e_2. \quad (5.8)\]

Eventually, we obtain that

\[f(\mathbf{x}) = \phi(x_0) + K(\arg(z_V)) x_1 e_1 + K(\arg(z_V)) x_2 e_2, \quad (5.9)\]

meaning that the complex ReLU acts like a real ReLU over the scalar part \(x_0\) and attenuates the vector components \(x_1, x_2\) by an amount proportional to the phase between them. Eq. 5.9, however, is unbounded for negative vector components (see Fig. 5.1b), which could cause numerical instability and negatively impact convergence. Hence, we would still want to keep the element-wise ReLU on the vector components to guarantee also a dependence on their magnitude. The final expression of GA-ReLU will then be:

\[\psi(\mathbf{x}) = (\phi \circ f)(\mathbf{x}) = \phi(x_0) + \phi(K(\tan^{-1}(\frac{x_2}{x_1})) x_1) e_1 + \phi(K(\tan^{-1}(\frac{x_2}{x_1})) x_2) e_2. \quad (5.10)\]

In Eq. 5.10 we have the advantage of having a magnitude scaling similar to a ReLU (i.e. 0 output for negative input), but also a phase dependency deriving from the complex ReLU (see Fig. 5.1c).

5.3.2 Data generation

We evaluate Navier-Stokes PDEs over a regular square grid of size \(128 \times 128\) with resolution \(\Delta x = \Delta y = 0.25\) . The fluid has a viscosity of \(\nu = 0.01\) and a buoyancy factor of 0.05. The scalar field \(s\) at \(t = 0\) is initialised with Gaussian noise over a centered grid, while the vector field \(\mathbf{u}\) at \(t = 0\) is initialised to be 0 throughout over a staggered grid. We run the fluid dynamics simulations via PhiFlow [75] for 21s and sample every \(\Delta t = 1.5s\) . We start collecting data after \(\tau_0 = 4s\) to move away from initial conditions.

5.3.3 Training details

The Clifford ResNet in \(\mathcal{G}_{2,0}\) has 4 residual blocks with 2 Clifford convolutional layers each, kernel size \(3 \times 3\) and 64 hidden channels for a total of 2.4M trainable parameters. The Clifford FNO in \(\mathcal{G}_{2,0}\) has 4 FNO blocks, 6 Fourier modes for the \(x, y\) components and 48 hidden channels, for a total of 38M trainable parameters. We trained the Clifford ResNet on data with batch size of \(B = 16\) and the Clifford FNO on data with batch size of \(B = 32\) . Both networks have been trained for at most 200 epochs, implementing early stopping, monitoring validation loss with patience \(P = 15\) for the Clifford ResNet and \(P = 10\) for the Clifford FNO. We minimized the One Step loss between ground truth \(\mathbf{x}_{t+2}\) and predicted \(\hat{\mathbf{x}}_{t+2}\) using the Adam optimizer and a fixed learning rate of \(\eta = 10^{-4}\) . Results produced have been averaged over 3 random seeds. Training has been performed on a single GPU NVIDIA GeForce RTX 4090, taking 64s per batch to train the Clifford FNO and 48s per batch to train the Clifford ResNet. As expected, the activation function does not impact training time.

5.3.4 Experiments

We generated our own dataset of fluid in motion over a regular grid through PhiFlow [75] following the specifications of [17]: we evaluate Navier-Stokes PDEs over a regular square grid of size \(128 \times 128\) with resolution \(\Delta x = \Delta y = 0.25\) . The fluid has a viscosity of \(\nu = 0.01\) and a buoyancy factor of 0.05. The scalar field \(s\) at \(t = 0\) is initialised with Gaussian noise over a centered grid, while the vector field \(\mathbf{u}\) at \(t = 0\) is initialised to be 0 throughout over a staggered grid. We run the fluid dynamics simulations via PhiFlow for 21s and sample every \(\Delta t = 1.5s\) . We start collecting data after \(\tau_0 = 4s\) to move away from initial conditions. The dataset is composed of 15600, 4680 and 3120 training, validation and testing sequences, respectively. We use “sequence” to denote a pair of inputs and targets \((\mathbf{x}_{t_i}, \mathbf{x}_{t_{i+1}}; \mathbf{x}_{t_{i+2}})\) . We trained a Clifford ResNet and a Clifford FNO with standard ReLU and GA-ReLU activation functions. We report

the summed mean squared error over the three multivector coefficients (One Step Loss) and over the scalar (Scalar Loss) and vector (Vector Loss) coefficients.

5.3.5 Results

c1d403182c643b948f31d40c48265e8f_img.jpg

Figure 5.4: A 2x5 grid of scalar field difference plots. The top row shows GA-ReLU results and the bottom row shows ReLU results. Each plot displays a complex field with red and blue regions. Black square boxes highlight areas of noticeable difference. A color bar on the right indicates values from -0.4 (blue) to 0.4 (red).

Fig. 5.4 Difference between ground truth and predicted scalar fields \(s_{t+2} - \hat{s}_{t+2}\) for 5 different time instants. Top row: GA-ReLU, bottom row: ReLU. Higher intensity is worse. Square boxes highlight areas of noticeable difference.

e63bc4fbfb9b2fc86ea0ca62663e0ab6_img.jpg

Figure 5.5: A 2x5 grid of vector field difference plots (quiver plots). The top row shows GA-ReLU results and the bottom row shows ReLU results. Each plot shows a vector field with arrows. Red square boxes highlight areas of noticeable difference.

Fig. 5.5 Difference between ground truth and predicted vector fields (i.e. quiver plots) \(\mathbf{u}_{t+2} - \hat{\mathbf{u}}_{t+2}\) for 5 different time instants. Top row: GA-ReLU, bottom row: ReLU. Higher intensity is worse. Square boxes highlight areas of noticeable difference.

The 3 metrics measured for datasets of different sizes are reported in Fig. 5.2 and in Fig. 5.3 for the Clifford ResNet and Clifford FNO, respectively. Note how, albeit small, the improvement from GA-ReLU is consistent for different dataset sizes. We plot the difference between ground truth and predicted scalar fields \(s, \hat{s}\) for 5 different sequences in the test set in Fig. 5.4 and the difference between ground truth and predicted vector fields \(\mathbf{u}, \hat{\mathbf{u}}\) in Fig. 5.5. Note how, despite having minimally modified

the activation function, it is possible to identify for each frame regions that deviate more from ground truth (i.e. areas in which the PDE solution is less exact) when employing an activation function that treats multivector components independently.

5.4 Fengbo: a neural operator for 3D PDEs over irregular geometries

In this Section we introduce a neural operator pipeline for computational fluid dynamics (CFD) cast entirely in Clifford Algebra. Named after the Taoist deity of the wind, Fengbo leverages the embedding of data within an algebra of choice in the form of multivectors, which are the fundamental objects in Clifford Algebra, to integrate physics and geometry data throughout the architecture. Its operators, layers and neurons are all expressed as multivectors in Clifford Algebra.

cdcbafff3cef7d54a001e3d0a4d9841e_img.jpg

The diagram illustrates the Fengbo architecture. It starts with two input multivectors: 'pressure geometry multivector' (P) and 'velocity geometry multivector' (V). Both are processed by a '3D Clifford Geometry Block' (part of the 'geometry domain'). The output of these blocks is then processed by a '3D Clifford FNO' (part of the 'global features' block). The output of the FNO is then processed by two '3D Clifford Physics Block' (part of the 'physics domain'). The output of the first physics block is a 'pressure scalar field' (P'), and the output of the second is a 'velocity 3D vector field' (V').

Diagram of the Fengbo architecture showing the flow from input multivectors through geometry, FNO, and physics blocks.

Fig. 5.6 The Fengbo architecture. Irregular geometries are discretised into fixed-resolution volumes of multivectors, over which Fengbo operates. It consists of three steps: (i) The geometry blocks operate on the geometry of the PDEs domain, capture local features, ensure grade mixing and upsample the inputs; (ii) The Clifford FNO establishes a mapping between the PDEs' geometry and their solution; (iii) The physics blocks operate on physical quantities, i.e. target of the regression. The entire architecture sits in 3D Clifford Algebra, guaranteeing interpretability.

Fengbo has three main components: (i) 3D Clifford Geometry block(s): one for each input geometry in the dataset, to mix elements of different grades in multivectors with geometrical meaning. (ii) 3D Clifford Fourier Neural Operator (FNO): to capture global interactions and map multivectors from the geometry to the physics domain. We extended their implementation in [17] to process full-grade 3D multivectors as opposed to only vector and bivector components. (iii) 3D Clifford Physics block(s): Similar to (i), but for multivectors with physical meaning. There is one Physics block for each output physical quantity to estimate in the dataset.

We tested Fengbo on two 3D computational fluid dynamics (CFD) datasets generated and analysed in [112]. Fengbo takes input multivectors representing the shape of the vehicles and estimates the pressure field on their surfaces as well as the velocity field defined over the domain. It does so with fewer than half the parameters required by the GINO architecture ([112]) and by directly processing the geometries employed in CFD. Since every intermediate output in Fengbo is a multivector with geometrical or physical meaning, Fengbo is a whitebox model that allows for a clearer understanding of how data are processed and transformed from geometry to physics.

5.4.1 Method

Notation. We employ lowercase Latin letters for scalar quantities (e.g. \(p_1, v_1\) ), boldface Latin letters for vectors (e.g. \(\mathbf{x}, \mathbf{n}, \mathbf{p}, \mathbf{v}\) ), uppercase Latin letters for multivectors (e.g. \(P, V, Q, B, W\) ) or integers (e.g. \(K, N, M, C\) ), lowercase Greek letters for real-valued maps (e.g. \(\phi, \psi\) ) and uppercase, boldface Greek letters for multivector-valued maps (e.g. \(\Phi, \Xi\) ). We use a dash symbol to distinguish multivectors describing geometrical quantities from those describing physical ones (e.g. \(P, P'\) ).

Navier-Stokes equations revisited. We reformulate Navier-Stokes equations introduced in Section 5.3 for consistency with the notation employed in the Fengbo project. Navier-Stokes equations describe the motion of fluids. We rewrite them as follows:

\[\frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \psi) = 0 \quad (5.11)\]

Eq. 5.11 represents the conservation of mass in a fluid flow

\[\rho \frac{\partial \psi}{\partial t} + \rho(\psi \cdot \nabla) \psi = -\nabla \phi + \mu \nabla^2 \psi + \mathbf{f} \quad (5.12)\]

With \(\phi, \psi\) the scalar pressure and vector velocity fields, respectively. Eq. 5.12 describes the conservation of momentum. As in many CFD applications, Eq. 5.11 can be simplified by assuming that the fluid is incompressible and no forcing terms are present. We shall restrict this work to the steady state model, in which partial derivatives in time are null.

5.4.2 Architecture

The Fengbo architecture, shown in Fig. 5.6, is an architecture that maps the geometry of the domain of the PDEs onto their solution. Specifically, we are interested in

estimating jointly the one-dimensional (1D) pressure field \(\phi(\mathbf{x}) : \Omega_D \subset \mathbb{R}^3 \rightarrow \mathbb{R}\) and the three-dimensional (3D) velocity field \(\boldsymbol{\psi}(\mathbf{x}) : \Omega_D \subset \mathbb{R}^3 \rightarrow \mathbb{R}^3\) that satisfy Eq. 1-2 over an irregular domain \(\Omega_D\) . We do it as follows:

Voxelisation of the fluid domain. To deal with an irregular domain we need to support unstructured meshes which are commonly used in CFD. We do so by generating a regular grid of \(M \times M \times M\) voxels inside the fluid domain \(\mathcal{D}\) . In general the voxels do not fit the boundary and the discretization parameter \(M\) should be sufficiently large to capture a good level of geometric details. Our domain is a discrete volume of 3D space throughout, a simpler alternative to embedding in latent space or the use of a graph representation of data.

Clifford Algebra embedding. We fill in each voxel \(i, j, k \in \mathcal{D}\) with a multivector \(P : \mathcal{D} \rightarrow G(3, 0, 0)\) , in which \(\mathcal{D} \subset \mathbb{R}^3\) represents the discrete grid of voxels in which the multivector field is defined and \(G(3, 0, 0)\) is the 3D Clifford Algebra. We call \(P_{ijk}\) the multivector associated with the voxel specified by indexes \(i, j, k\) . We construct multivectors \(P\) to have a scalar component \(m_p\) , a vector component \(\mathbf{p}\) and a bivector component \(B\) .

  • The scalar part is a binary mask \(m_p\) , included to inform the network about which voxels are filled and avoid ambiguity between the origin and empty voxels in the vector part, since for both it holds that \(p_1 = p_2 = p_3 = 0\) .
  • The vector part \(\mathbf{p}\) represents the \(N\) -point point cloud \(\mathbf{p} \in \mathbb{R}^3\) of coordinates in 3D space and it encodes information about the shape or contour of the object.
  • The bivector \(B\) represents the plane orthogonal to the normal \(\mathbf{n}\) defined for each point in \(\mathbf{p}\) . In other words, \(B\) is the dual of \(\mathbf{n}\) , i.e. \(B = I_3 \mathbf{n}\) , in which \(\mathbf{n}\) is the normal vector perpendicular to the mesh points on the car surface and \(I_3 = e_1 \wedge e_2 \wedge e_3\) is the \(G(3, 0, 0)\) pseudoscalar. \(B\) has its own surface area and orientation and hence it can be interpreted as containing information about the surface of the object.
  • The trivector component is left blank.

Since the output pressure field \(\phi(\mathbf{x})\) is defined over each point of \(\mathbf{p}\) , we call \(P\) the pressure geometry multivector. An example of a pressure geometry multivector

3164cf54c06cbe87b0a432d17eb9417b_img.jpg

pressure geometry multivector in \(G(3,0,0)\)

scalar vector bivector trivector

components: binary mask (1) 3D coordinates (3) dual of the vectors normal to the surface (3) None (1)

Figure 5.7: An example of pressure geometry multivector P. The figure shows a decomposition of a 3D car model into its geometric components: scalar (binary mask), vector (3D coordinates), bivector (dual of surface normals), and trivector (None).

Fig. 5.7 An example of pressure geometry multivector \(P\) . It has a scalar component (the binary mask \(m_p\) ), 3 vector components (the 3D coordinates \(\mathbf{p}\) ) and 3 bivector components (the dual of the vectors \(\mathbf{n}\) normal to points \(\mathbf{p}\) ).

is given in Fig. 5.7. The general form of the pressure geometry multivector is:

\[P = m_p + \mathbf{p} + B = \underbrace{m_p}_{\text{scalar}} + \underbrace{p_1 e_1 + p_2 e_2 + p_3 e_3}_{\text{vector}} + \underbrace{B_{12} e_{12} + B_{13} e_{13} + B_{23} e_{23}}_{\text{bivector}} \quad (5.13)\]

For datasets that include other physical fields, we define other input multivectors. For example, if the velocity vector field \(v\) is known, we construct a corresponding multivector \(V\) . The multivector \(V : \mathcal{D} \rightarrow G(3, 0, 0)\) is also defined over a regular grid of voxels. We construct \(V\) to have a scalar component \(m_v\) and a vector component \(\mathbf{v}\) . The vector component \(\mathbf{v}\) corresponds to the \(K\) -point point clouds \(v \in \mathbb{R}^3\) , with \(K \gg N\) , defined for points surrounding the car surface, and \(m_v\) is its corresponding binary mask defined similarly to \(m_p\) . Since the output velocity field \(\psi(\mathbf{x})\) is defined over each point of \(\mathbf{v}\) , we call \(V\) the velocity geometry multivector. Each velocity geometry multivector \(V\) is of the form:

\[V = m_v + \mathbf{v} = \underbrace{m_v}_{\text{scalar}} + \underbrace{v_1 e_1 + v_2 e_2 + v_3 e_3}_{\text{vector}} \quad (5.14)\]

in which, similarly to \(P\) ,

  • The scalar part is a binary mask \(m_v\) .
  • The vector part \(\mathbf{v}\) represents the \(N\) -point point cloud \(\mathbf{v} \in \mathbb{R}^3\) of coordinates in 3D space over which the velocity field is defined.
  • The bivector and trivector components are left blank.

a554d5480f9ae091a17e50011afe5968_img.jpg

The figure illustrates the decomposition of a velocity geometry multivector \(V\) into its components. On the left, a blue cube labeled \(V\) is shown. To its right, a dashed box contains four components: scalar (a black rectangular prism representing a binary mask \(m_v\) ), vector (a 3D coordinate system with axes \(e_1, e_2, e_3\) and a color bar indicating values), bivector (labeled 'None (3)'), and trivector (labeled 'None (1)'). Below the components, the text 'components: binary mask (1) 3D coordinates (3) None (3) None (1)' is displayed. Above the components, the text 'velocity geometry multivector in \(G(3,0,0)\) ' is shown.

Figure 5.8: An example of velocity geometry multivector V. The figure shows a decomposition of V into its components: scalar (binary mask), vector (3D coordinates), bivector (None), and trivector (None).

Fig. 5.8 An example of velocity geometry multivector \(V\) . It has a scalar component (the binary mask \(m_v\) ) and 3 vector components (the 3D coordinates \(\mathbf{v}\) ).

An example of \(V\) is shown in Fig. 5.8. The geometry multivectors \(P\) and \(V\) are representative of the geometry simply because they are themselves the geometry of the PDEs domain.

3D Clifford Geometry block. We define the 3D Clifford Geometry block as the module acting on volumes of multivectors with a sequence of three 3D convolutional layers in Clifford Algebra. In Clifford Algebra Networks, weights and biases are multivectors \(W, D \in G(3, 0, 0)\) , and convolutions are performed via geometric products:

\[Q_{ijk}^{(c_{out})} = \sum_{c_{in}=1}^C \sum_l \sum_m \sum_n P_{i+l,j+m,k+n}^{(c_{in})} W_{ijk}^{(c_{in},c_{out})} + D_{ijk}^{(c_{out})} \quad (5.15)\]

with \(W\) weights and \(D\) biases, and where the range of the summation of \(l, m, n\) is specified by the kernel size and \(c_{in}, c_{out}\) are the input and output channels, respectively. The 3D Clifford Geometry block takes in as input a single geometry multivector (e.g. \(P, V\) ) and it outputs \(C\) channels of multivectors. It serves four purposes:

  • grade mixing: multivectors \(P, V\) only contains elements of a certain grade. Convolutional layers allow grades to mix and have full multivectors in 3D Clifford Algebra (i.e. with scalars up to trivector components).
  • capturing local interactions: convolutions are traditionally used to extract feature from data which are close to each other in space.
  • filling up the volume: fitting an irregular grid into a regular one requires a high-resolution grid, meaning that most of the initial input volume is sparse. Convolutions with a large enough kernel avoid this issue by filling up the volume.
  • increasing the number of channels in input to the Clifford FNO block.

We refer to the output of the geometry block processing the shape over which the pressure field is defined (i.e. the pressure multivector \(P\) ) as \(Q_P\) and to the output of the geometry block processing the shape over which the velocity field is defined (i.e. the velocity multivector \(V\) ) as \(Q_V\) .

Clifford FNO block. The 3D Fourier Neural Operator (FNO) in Clifford Algebra learns a multivector valued function \(\Phi(Q) : G(3, 0, 0) \rightarrow G(3, 0, 0)\) described by

\[Q' = \Phi(Q) = \mathcal{F}^{-1}\{\mathcal{F}\{Q\}(\xi) \cdot \mathcal{F}\{\mathbf{k}_a\}(-\xi)\} \quad (5.16)\]

in which \(\xi\) is the variable in Fourier domain, \(\mathbf{k}_a : \mathbb{R}^3 \rightarrow G(3, 0, 0)\) is the learnable filter of the FNO and \(\mathcal{F}\) and \(\mathcal{F}^{-1}\) are the Fourier and inverse Fourier transforms, respectively, with the Fourier transform in \(G(3, 0, 0)\) applied pointwise over each real coefficient of \(Q\) and defined as:

\[\hat{Q}(\xi) = \mathcal{F}\{Q\}(\xi) = \hat{Q}_0 + \hat{Q}_1 e_1 + \hat{Q}_2 e_2 + \hat{Q}_3 e_3 + \hat{Q}_{12} e_{12} + \hat{Q}_{13} e_{13} + \hat{Q}_{23} e_{23} + \hat{Q}_{123} e_{123}. \quad (5.17)\]

\(Q\) is defined as the sum of all the multivectors in as output of the Geometry blocks . The codomain of \(\Phi(Q)\) is also multivector valued, and each multivector in as output of the 3D Clifford FNO, which we refer to as \(Q'\) , is defined on a grid with the same resolution \(M\) of the inputs. The FNO captures global interactions within the geometry and maps the input multivectors from a geometric to a physics domain.

3D Clifford Physics block. The 3D Clifford Physics block is analogous to its Geometry counterpart. It differs from it since it brings the \(C\) channels of multivector \(Q'\) in as output of the FNO down to 1. As we estimate two different quantities, we have two different blocks to output \(P'\) and \(V'\) , the pressure physics multivector and velocity physics multivector, respectively, for which we set

\[\langle P' \rangle_0 = \phi \quad (5.18)\]

\[\langle V' \rangle_1 = \psi_1 e_1 + \psi_2 e_2 + \psi_3 e_3 \quad (5.19)\]

where \(\langle \cdot \rangle_k\) is the grade projector operator, which extracts the \(k\) -grade element out of the multivector. In short, Fengbo models the PDE solution problem as a mapping \(\Xi(\cdot)\) of 3D (geometry) multivectors onto 3D (physics) multivectors in 3D Clifford Algebra \(G(3, 0, 0)\) , i.e.

\[\mathbf{P}' = \Xi(\mathbf{P}) \quad (5.20)\]

Table 5.1 The three steps of the Fengbo pipeline.

Module Input Output Purpose
1. 3D Clifford Geometry blocks \(\mathbf{P} = \{P_i\}_{i=1}^{N_g}\) \(\mathbf{Q} = \{Q_i\}_{i=1}^{N_g}\) local, upsample, grade mixing
2. 3D Clifford FNO \(Q = \sum_{i=1}^{N_g} Q_i\) \(Q'\) global, PDE modelling
3. 3D Clifford Physics blocks \(Q'\) \(\mathbf{P}' = \{P'_i\}_{i=1}^{N_p}\) local, downsample, grade mixing

in which \(\mathbf{P} = \{P_i\}_{i=1}^{N_g}\) , with \(N_g\) the number of input geometries in the dataset, and \(\mathbf{P}' = \{P'_i\}_{i=1}^{N_p}\) , with \(N_p\) the number of output physical quantities to estimate. \(N_g\) and \(N_p\) determine the number of Geometry and Physics blocks in Fengbo, respectively. The steps in the Fengbo architecture are summarised in Table 5.1.

5.4.3 Experiments

Datasets

ShapeNet Car. The ShapeNet Car dataset is a subset the larger ShapeNet 3D model repository consisting of thousands of realistic 3D car models employed in a CFD simulation with constant inlet flow velocity [29]. It contains 500 shapes for training and 111 for testing. For this dataset, \(N_g = 2, N_p = 2\) , i.e. \(\mathbf{P} = \{P, V\}\) (two inputs geometries) and \(\mathbf{P}' = \{P', V'\}\) (two physical quantities to estimate, defined over the two different geometries).

Ahmed Body. The Ahmed Body dataset consists of CFD simulations with varying inlet flow velocity \(\psi_{in}\) [5]. It contains 500 parametric variations of Ahmed's body for training and 51 for testing. For this dataset, \(N_g = 1, N_p = 1\) , i.e. \(\mathbf{P} = \{P\}\) (a single input geometry) and \(\mathbf{P}' = \{P'\}\) (one physical quantity to estimate, no velocity field information provided). The inlet velocity is a crucial component since the output pressure field range depends on it. We embedded it as the trivector component of \(P\) since it has a single component in one direction, i.e. \(\psi_{in}e_1 + 0e_2 + 0e_3\) . We do so by setting \(\langle P \rangle_3 = (m_p \cdot \psi_{in})e_{123}\) , in which \(m_p\) is the binary mask.

Metrics

We assess the quality of the pressure field estimation through the relative \(L_2\) norm (a percentage), defined as:

\[\mathcal{L}_P = \frac{\|\langle P'_{GT} \rangle_0 - \langle P' \rangle_0\|_2}{\|\langle P'_{GT} \rangle_0\|_2} = \frac{\|\phi(\mathbf{x}) - \tilde{\phi}(\mathbf{x})\|_2}{\|\phi(\mathbf{x})\|_2} \quad (5.21)\]

41c354be5fdbdb4f0fc864d4ea9d3363_img.jpg

Figure 5.9: (a) Ground truth pressure field, (b) Fengbo's estimated pressure field, and (c) their relative error for a test shape in the ShapeNet Car dataset. (a) and (b) show a 3D car model with a color gradient from yellow to dark blue representing pressure values, with a color bar above ranging from -200 to -150. (c) shows the same car model with a color gradient from yellow to dark purple representing relative error, with a color bar above ranging from 0.0 to 1.4.

Fig. 5.9 (a) Ground truth pressure field (b) Fengbo's estimated pressure field and (c) their relative error for a test shape in the ShapeNet Car dataset.

for the pressure field, and as

\[\mathcal{L}_V = \frac{\| \langle V'_{GT} \rangle_1 - \langle V' \rangle_1 \|_2}{\| \langle V'_{GT} \rangle_1 \|_2} = \frac{\| \psi(\mathbf{x}) - \hat{\psi}(\mathbf{x}) \|_2}{\| \psi(\mathbf{x}) \|_2} \quad (5.22)\]

for the velocity field, in which \(\hat{\phi}, \hat{\psi}\) represent estimated pressure and velocity fields via Fengbo, extracted as the grade-0 and grade-1 components of estimated physics multivectors \(P', V'\) , respectively, while \(\phi, \psi\) represent ground truth fields, extracted as the grade-0 and grade-1 component of ground truth physics multivectors \(P'_{GT}, V'_{GT}\) , respectively. The relative \(L_2\) norm has also been employed as the loss function during training.

Results

Results are summarised in Tables 5.2-5.3 for the ShapeNet Car and the Ahmed Body datasets, respectively, following the structure of [112]. Fengbo outperforms all variants of vanilla Fourier and Graph Neural Operators, as well as UNet and Mesh GraphNet, and it yields comparable results to GINO. For the ShapeNet Car dataset, for example, Fengbo is able to estimate the pressure field with a 0.6% lower relative \(L_2\) norm compared to the GINO in its encoder-decoder (e-d) configuration, but with a 1.6% higher error compared to its decoder-only (e) configuration. Fengbo achieves the same task, but does it with 60% fewer parameters. It also has a lower computational cost that doesn't depend on how complex the input graph is. On top of that, it's the only known method that can estimate both the pressure field and the 3D velocity field at the same time. Fengbo achieves competitive accuracy thanks to this coupling of physical quantities and through simple convolutions on coarsely discretised meshes.

This is especially notable when compared to more sophisticated architectures, including Geo-FNO [110], which degenerates when dealing with complex geometry, as

Table 5.2 Training and testing errors in pressure and velocity prediction on the ShapeNet Car dataset. Fengbo results have been obtained with \(\alpha, \beta = \{5, 1\}\) for pressure and with \(\alpha, \beta = \{1, 50\}\) for velocity (see Section 5.4.6).

Model Pressure Velocity
training error testing error training error testing error
MLP - 13.0 - 5.12
PointNet [152] - 11.0 - 4.94
Graph U-Net [55] - 11.0 - 4.71
GraphSage [62] - 10.5 - 4.61
MeshGraph Net [149] - 7.81 - 3.54
GNO [111] 18.2 18.8 - 3.83
Geo-FNO [113] 10.8 15.9 - 16.7
UNet [156] 12.5 12.8 - -
FNO [113] 9.65 9.42 - -
GINO (encoder-decoder) [112] 7.95 9.47 - 3.86
GINO (decoder) [112] 6.37 7.12 - -
Fengbo [ours] 6.94 8.86 3.23 3.47

Table 5.3 Training and testing errors in pressure prediction on the Ahmed Body dataset.

M.Gr.Net UNet FNO GINO (e-d) GINO (d) GINO (e-d), \(r = 0.025\) GINO (d), \(r = 0.025\) GINO (e-d) \(r = 0.035\) GINO (d) \(r = 0.035\) Fengbo [ours]
training 9.08 9.93 13.0 9.36 9.34 12.9 12.6 9.26 8.82 8.00
testing 13.9 11.2 12.6 9.01 8.31 12.8 12.7 9.30 9.39 10.7

shown in [112, 188], despite it being precisely designed to learn to deform irregular domains onto a regular grid to be fed into the FNO. Models such as ONO [193] and OFormer [110], which are transformer-based, also become unstable when dealing with large meshes, as found in [188].

Interestingly, estimating the velocity vector field appears to be an easier task to tackle. This is likely due to the significantly smaller variance of the velocity data as opposed to the sharp pressure variation over the car surface, as well as the fact that the velocity field \(\psi(\mathbf{x})\) is defined over a point cloud containing \(10\times\) more points as opposed to \(\phi(\mathbf{x})\) . This is mirrored also in the smaller gap between training and testing errors. The additional estimation of the velocity vector field does not imply a computational overhead, since the vector and pressure fields exist naturally within

ca493068aa5ca25cfd155c2d3639ee21_img.jpg

Figure 5.10: Pressure field visualization for a test shape in the Ahmed Body dataset. (a) Ground truth pressure field showing a color gradient from green to yellow. (b) Fengbo's estimated pressure field, which closely matches the ground truth. (c) Relative error field, showing a dark purple color indicating low error. Above the visualizations are two color bars: the left one for pressure ranges from -1250 to 500, and the right one for relative error ranges from 0.0 to 1.4.

Fig. 5.10 (a) Ground truth pressure field (b) Fengbo’s estimated pressure field and (c) their relative error for a test shape in the Ahmed Body dataset.

the multivector-based formulation of the problem and they are both embedded in a fixed-size volume. Note that this would not apply to graph-based methods, in which a larger cell count would mean a larger number of nodes, increasing drastically the computational complexity (see Table 5.4).

Similar observations can be made for the Ahmed Body dataset. Fengbo outperforms all models reported, with the exception of some GINO configurations depending on the choice of the radius of the Graph Neural Operator module. It is worth mentioning that experiments in [112] could benefit from the joint estimate of the wall shear stress, a physical parameter which was missing in the version of the dataset we employed. We expect that regressing also on wall shear stress in a joint fashion, just like pressure and velocity for ShapeNet Car, could bring down the error of 10.7% we obtained on the test set with Fengbo. Note also how Fengbo attains a training error of just 8%, the lowest out of every other model reported, indicating how additional parameter optimisation could be performed and likely reduce overfitting to bring the error down even further.

Comparison of estimates with Fengbo and corresponding ground truth pressure fields are given in Figs 5.9 and in Fig. 5.10 for the ShapeNet Car and Ahmed Body datasets, respectively. Note how errors in the pressure field are generally isolated points in a more or less uniform region with relative error close to zero. We are convinced that by simply smoothing the predicted field we could mitigate this issue and improve performance.

c725c5cdd568eb2ba1793def3680b20e_img.jpg

Figure 5.11: Three 3D point cloud visualizations of a car shape. (a) Ground truth velocity field, (b) Fengbo's estimated velocity field, and (c) their relative error. Above (a) and (b) is a color bar ranging from -3 (red) to 3 (blue). Above (c) is a color bar ranging from 0 (green) to 3 (white).

Fig. 5.11 (a) Ground truth velocity field (b) Fengbo's estimated velocity field and (c) their relative error for a test shape in the ShapeNet Car dataset.

In Fig. 5.11, the ground truth and estimated velocity fields for a test case in the ShapeNet Car are reported. Note how the range of the relative \(L_2\) error in Fig. 5.11c is significantly smaller than the ranges in Figs. 5.9c - 5.10c. This is likely due to the denser, larger point clouds over which \(\psi(\mathbf{x})\) is defined. Note also how larger errors are concentrated in the areas surrounding the outline of the car. Small discontinuities in the estimated field with respect to ground truth can be noticed in Fig. 5.11b, for

633486b12958f97e062b5cf9e0801e99_img.jpg

The diagram shows the flow of data through the Fengbo architecture. On the left, a dashed box labeled 'scalar and vector components of \(Q'\) (4 channels)' contains two rows of visualizations. The top row, labeled 'scalar', shows four green 3D volumes representing the scalar part of the multivector. The bottom row, labeled 'vector', shows four red 3D volumes representing the vector part. Arrows point from these inputs to two parallel processing paths. Each path consists of a 3D Physics block (represented by a vertical orange bar) followed by a 3D Clifford FNO block (represented by a horizontal sequence of colored shapes). The output of the top path is a green 3D volume labeled \(\hat{\phi}(\mathbf{x})\) , and the output of the bottom path is a red 3D volume labeled \(\hat{\psi}(\mathbf{x})\) .

Diagram illustrating the processing of intermediate outputs from the 3D Clifford FNO block. The input consists of 4 channels of scalar and vector components of Q'. These are processed by two parallel 3D Physics blocks (N_p = 2) to produce the final pressure field phi_hat(x) and velocity field psi_hat(x).

Fig. 5.12 Intermediate outputs are interpretable physical quantities: \(Q'\) , the output of the 3D Clifford FNO block, is processed by \(N_p = 2\) 3D Physics blocks in parallel to obtain \(\hat{\phi}(\mathbf{x})\) and \(\hat{\psi}(\mathbf{x})\) , the pressure and velocity fields, respectively.

example in the bottom right Section: just as for \(\hat{\phi}(\mathbf{x})\) , we believe that smoothing the estimated field \(\hat{\psi}(\mathbf{x})\) can further reduce the prediction error.

An example of the interpretability offered by Fengbo is given in Fig. 5.12. \(Q'\) , the multivector in the output of the 3D Clifford FNO module, is processed by the 2 Clifford Physics blocks to obtain 2 multivectors \(P'\) and \(V'\) , from which we extract the scalar part \(\hat{\phi}(\mathbf{x})\) and the vector part \(\hat{\psi}(\mathbf{x})\) , respectively. Note that we are still dealing with full grade multivectors defined over the entire domain \(\mathcal{D}\) , but for the sake of visualisation we only plot the scalar and vector component masked by \(m_s\) and \(m_v\) , respectively. As the velocity and pressure fields are processed, it is possible to have a visual intuition into how they are being transformed into the final estimate. As the quantities plotted are scalars and vectors throughout, they carry physical meaning and cannot be interpreted as anything else but pressure and velocity fields, therefore we can claim that Fengbo is a whitebox model. This concept of interpretable convergence is analogous to that presented in [137] for protein structures and in [142] for camera poses.

Fengbo has a computational complexity of \(O(N \log N)\) (see Table 5.4): the embedding into a 3D volume has complexity \(O(N)\) , and the limiting component on the computational complexity is given by the Clifford FNO module, with complexity \(O(N \log N)\) . Moreover, Fengbo's accuracy is minimally impacted by smaller grid resolutions, making it robust to coarser discretisations and hence discretisation convergent.

5.4.4 Implementation details

Details of the Fengbo architecture are shown in Fig. 5.13. The 3D Clifford Geometry block (Fig. 5.13a) consists of 3 3D convolutional layers with kernel size \(5 \times 5 \times 5\) . The first 2 convolutions are followed by a group normalisation layer and a GeLU activation

Table 5.4 Comparison of different models. \(d\) is the maximum degree of the graph, \(D\) is the feature space dimensionality. *: See notes on computational complexity.

Model Range Complexity Irregular Grid Discretisation Convergent
PointNet [152] global \(\mathcal{O}(N)\)
PointNet++ [152] local-global \(\mathcal{O}(N \log N)\)
GNN [162] local \(\mathcal{O}(Nd)\)
CNN [104] local \(\mathcal{O}(N)\)
UNet [156] global \(\mathcal{O}(N)\)
Transformers [177] radius \(r\) \(\mathcal{O}(N^2)\)
Transolver [188] local-global \(\mathcal{O}(NSC + NS^2)*\)
Galerkin [25] global \(\mathcal{O}(ND^2)\)
MeshGraphNet [149] local-global \(\mathcal{O}(Nd)\)
GNO [111] global \(\mathcal{O}(Nd)\)
FNO [113] global \(\mathcal{O}(N \log N)\)
Geo-FNO [113] global \(\mathcal{O}(N \log N)\)
GINO [112] local-global \(\mathcal{O}(N \log N + Nd)\)
Fengbo [ours] local-global \(\mathcal{O}(N \log N)\)

function. The block yields multivectors with a progressively increasing number of channels \(C_g = \{1, 2, 4\}\) .

6a1a398899211ab3a8cf416d9f1a3fff_img.jpg

(a) 3D Clifford Geometry block

(b) 3D Clifford Physics block

(c) 3D Clifford FNO block

Diagram illustrating the three components of the Fengbo architecture: (a) 3D Clifford Geometry block, (b) 3D Clifford Physics block, and (c) 3D Clifford FNO block. (a) shows a sequence of 3D Clifford Geometry blocks with increasing channel counts (1, 2, 4). (b) shows a sequence of 3D Clifford Physics blocks with increasing channel counts (1, 2, 4). (c) shows a 3D Clifford FNO block followed by a sequence of 3D Clifford Physics blocks with increasing channel counts (1, 2, 4).

Fig. 5.13 The three components of the Fengbo architecture.

An example of the geometric interpretability of the intermediate outputs of Fengbo is given in Fig. 5.14. Multivector \(P\) , containing scalar, vector and bivector components, is processed by the 3D Clifford Geometry block to obtain the 4-channel-multivector \(Q\) . We employ a grayscale colormap for scalar quantities, i.e. scalar and trivector components, and jet and rainbow colormaps for the vector and bivector components, respectively. The input shape multivector \(P\) built from the dataset is scattered within the \([-1, 1]\) volume, bounded by the \(\tanh\) activation function. Elements of different grades are mixed, as can be noticed from the appearance of trivector components. The sequence of convolutions makes the 3D multivectors progressively denser. The last multivector \(Q_P\) , with 4-channels, is unbounded due to the lack of an activation function and fed into the 3D Clifford FNO. Each channel shows how different grade elements in the volume cluster to form different shapes, more or less aligned in a

certain direction. While far from the original car shape, these blobs indeed represent scalar, vectors, bivectors and trivectors in 3D space: the vector components shown, for example, cannot be interpreted as anything else than coordinates of 3D point clouds precisely because of our choice of embedding. Similar considerations can be made for \(V\) and \(Q_V\) .

4944a3da871459deef6987c6be0a097e_img.jpg

Figure 5.14: Intermediate layers outputs from P to Q_P within the 3D Clifford Geometry block for a test sample in the ShapeNet Car dataset. The figure shows a grid of visualizations. The first column displays the input car shape and its decomposition into scalar, vector, bivector, and trivector components. The subsequent columns show the output of intermediate layers, with each row corresponding to one of the four components. The visualizations for the vector, bivector, and trivector components show increasingly complex spatial distributions, while the scalar component remains relatively simple.

Fig. 5.14 Intermediate layers outputs from \(P\) to \(Q_P\) within the 3D Clifford Geometry block for a test sample in the ShapeNet Car dataset.

Fig 5.13b shows the 3D Clifford Physics block. It contains the same layers as the 3D Clifford Geometry block, but with a decreasing number of channels \(C_p = \{2, 1, 1\}\) , as shown in Fig. 5.12, and a different meaning attached to the multivector representation, where the scalar and vector part represent the pressure field \(\phi(\mathbf{x})\) and the velocity field \(\psi(\mathbf{x})\) .

Fig 5.13c shows the 3D Clifford FNO block. It includes \(F = 2\) residual blocks, 4 input channels, 4 output channels, \(C = 25\) hidden channels and \(m = 8\) Fourier modes for each of its 3 dimensions. Each layer is followed by a group normalisation layer and a GeLU activation function.

Pipeline Generalisability. Fengbo operates entirely within Clifford Algebra, which is naturally suited for a multidimensional treatment [99, 98, 43, 70]. Extensions of a same architecture in Clifford Algebra to higher dimensions has been widely documented in the literature as one of the main advantages of these types of networks [17, 158, 159, 139]. The implementation of a 2D equivalent to our 3D Fengbo pipeline is straightforward, as detailed in Table 5.5. By projecting the convolutional operations from 3D to 2D and operating within either \(G(2, 0, 0)\) or \(G(0, 2, 0)\) , we achieve a fully analogous approach for 2D problems with targets that include scalar fields or 2D vector fields. Just as Fengbo 3D is not limited to the task of 3D flow estimation, Fengbo 2D is not limited to 2D flows, but it can be extended to any 2D PDE that establishes a mapping from the geometry to the physics of the problem.

Table 5.5 Comparison of Fengbo’s 3D and 2D configurations.

Model Algebra Dimensionality ( \(D\) ) Tensor Shape Complexity Geometry Block FNO Physics Block Normalisation Targets
Fengbo 3D \(G(3, 0, 0)\) \(G(0, 3, 0)\) \(C \times M \times M \times M \times D\) \(\mathcal{O}(N \log N), N = M^3\) 3D Clifford Convolutions 3D full-grade Spectral Convolutions, 3 Fourier modes 3D Clifford Convolutions Group Normalisation 3D 2 scalar fields, 1 3D vector field, 1 3D bivector field.
Fengbo 2D \(G(2, 0, 0)\) \(G(0, 2, 0)\) \(C \times M \times M \times D\) \(\mathcal{O}(N \log N), N = M^2\) 2D Clifford Convolutions 2D full-grade Spectral Convolutions, 2 Fourier modes 2D Clifford Convolutions Group Normalisation 2D 2 scalar fields, 1 2D vector field.

We can consider the 2D Fengbo as a simpler subcase of the 3D case, since its 3D implementation presents several more challenges, namely:

  • 6-dimensional tensors in 3D, with shape \(B \times C \times M \times M \times M \times D\) , where \(B\) is the batch size, \(C\) is the number of channels, \(M\) is the grid resolution and \(D\) is the algebra dimensionality as opposed to 5-dimensional tensors in 2D, with shape \(B \times C \times M \times M \times D\) , which require significantly less memory and allow for larger model sizes.
  • Larger algebra dimensionality, \(D = 2^3 = 8\) elements in \(G(3, 0, 0)\) , namely 1 scalar, 1 trivector, 3 vectors and 3 bivectors, as opposed to the 2D case with \(D = 4\) , with 1 scalar, 1 bivector and 2 vectors. This has implications in the sparsity of the input tensors and in their memory requirements, which negatively impact convergence.
  • Significantly higher computational complexity, since it stays the same for both cases, namely \(\mathcal{O}(N \log N)\) , but with \(N = M^3\) for the 3D case and \(N = M^2\) for the 2D case. Moving to 2D would allow for a larger discretisation that can preserve a higher level of detail at a fraction of the computational cost.

Moreover, handling 3D datasets with the proposed pipeline inherently includes the capability to process 2D datasets of [113, 188]. This is shown in Table 5.6. When processing instances in ShapeNet Car and Ahmed Body, we: 1) sample points from the unstructured meshes; 2) discretise the irregular point clouds onto regular grids and 3) embed them in multivector form. Datasets like AirFRANS, which also contains unstructured meshes, would be processed in the same way. All the remaining 6 datasets fall into data structures which are intermediate steps of the pipeline we established with Fengbo: point clouds, like the Elasticity dataset, can be directly discretised onto regular grids and embedded into multivector form, regular grids of the Plasticity, Airfoil and Pipe datasets can be simply embedded as multivectors, as already demonstrated in [17, 138], and structured meshes of the Navier-Stokes and Darcy datasets can be

processed like their unstructured counterparts, demonstrating Fengbo’s generalisability to 2D cases.

Table 5.6 Data representations and their processing steps with the Fengbo pipeline.

Unstructured Meshes Point Clouds Regular Grid Structured Mesh
Datasets ShapeNet Car, Ahmed Body (3D), AirFRANS (2D) Elasticity (2D) Plasticity, Airflow, Pipe (2D) Navier-Stokes, Darcy (2D)
1. Sample points from mesh - -
2. Discretise onto regular grid
3. Embed in multivector form

5.4.5 Training details

Fengbo was trained on 3 NVIDIA A100 GPUs with 40GB RAM. It was trained for 100 epochs with a batch size of 3, for a total of approximately 24 compute hours. We employed the Adam optimiser to update the model’s weights with default parameters of \(\beta_1 = 0.9, \beta_2 = 0.999\) . We adopted a learning rate of \(10^{-4}\) , reduced on plateau by a factor of 2 with patience on the validation loss set to 8 epochs. The loss we minimised for the ShapeNet Car dataset is

\[\mathcal{L} = \alpha \mathcal{L}_P + \beta \mathcal{L}_V + \|\phi - \hat{\phi}\|_1 \quad (5.23)\]

with parameters \(\alpha = 5, \beta = 1\) picked empirically to weight the pressure component more, since it was harder to regress. A similar loss was employed for the Ahmed Body dataset:

\[\mathcal{L} = \alpha \mathcal{L}_P + \beta \|r - \hat{r}\|_1 + \|\phi - \hat{\phi}\|_1 \quad (5.24)\]

but with \(\alpha = 1, \beta = 1\) and \(r\) being the Reynolds number embedded as the trivector component of the output, i.e. \(\langle R' \rangle_3 = (m_p r) e_{123}\) , where \(R'\) refers to the second output of the network, \(P'\) being the first. No velocity information is provided. The L1 norm term on pressure was added to further penalise large deviations of \(\hat{\phi}\) , the estimated pressure field with respect to ground truth.

We employed elastic net regularisation, with \(\lambda_1 = 10^{-5}\) lasso regularisation coefficient and \(\lambda_2 = 10^{-4}\) ridge regularisation coefficient, to encourage group selection of correlated features and reduce overfitting, which we found to be significant over such small training sets. For training, input geometries are normalised in the range \([-1, 1]\) , vector velocity fields are normalised in the range \([0, 1]\) and pressure fields are unit normalised by subtracting their mean and dividing them by their variance. Test metrics are measured over denormalised quantities to yield physically meaningful errors.

5.4.6 Ablation study

Table 5.7 Ablation on the impact of \(M\) . Reporting the relative \(L_2\) norm.

Grid Size \(M\) ShapeNet Car Ahmed Body
Pressure Velocity Pressure
training testing training testing training testing
40 6.69 12.8 5.58 6.11 10.6 15.1
50 7.38 11.2 5.99 5.70 10.4 13.9
60 6.42 10.1 5.17 5.26 9.17 12.3
70 7.43 10.5 5.97 5.61 8.10 11.7
80 6.94 8.86 5.56 5.10 8.00 10.7

Pipeline Scalability. We study the impact of four components of the Fengbo pipeline, namely the grid size \(M\) , the number of hidden channels in the 3D Clifford FNO module \(C\) , the number of blocks in the FNO \(F\) and the number of modes in the FNO \(m\) .

The impact of grid size on Fengbo’s accuracy is shown in Table 5.7. We train and test on volumes with the same resolution. It can be noted that, by reducing the number of voxels by 87.5% (from \(M = 80\) to \(M = 40\) ), i.e. significantly reducing the level of details in our input shape, Fengbo still yields a test error only 4% higher for the ShapeNet Car dataset and 4.5% higher for the Ahmed Body dataset. We can hence claim discretisation convergence.

Since the 3D Clifford FNO is Fengbo’s key component, we study the impact of its parameters. In Fig. 5.15 we report the ablations over the number channels \(C\) within the FNO for the two datasets and for the relative \(L_2\) norm over pressure and velocity fields \(\phi, \psi\) , respectively. For the velocity field of ShapeNet Car we tested two combinations of the weighting coefficients of the loss function \(\alpha, \beta\) . The grid size is fixed to \(M = 80\) and the number of blocks within the FNO is fixed to \(F = 2\) , while we test \(C = \{5, 10, 15, 20, 25\}\) .

In all four presented scenarios, a higher number of hidden channels (i.e. a wider network) yields a steep decrease in the error at testing stage, proving how a larger version of Fengbo to that presented in Section 5.4 ( \(C = 25\) ) could further improve the quality of the PDE solution and demonstrating its scalability with respect to \(C\) .

In Fig. 5.16 we study the impact of the number of FNO blocks for the same four cases above. We tested \(F = 1, 2, 3, 4\) by keeping \(C = 15\) and \(M = 80\) . Also in this scenario, a deeper network corresponds to a more accurate estimation and hence

d24d2a6c29ed7ac1827b2985ab9dac34_img.jpg

(a) Relative \(L_2\) norm on pressure, ShapeNet Car dataset versus \(C\) .

# Hidden Channels - \(C\) Relative \(L_2\) norm - \(\phi\) (ShapeNet Car)
5.0 10.2
10.0 9.6
15.0 9.5
20.0 9.3
25.0 8.8

(b) Relative \(L_2\) norm on pressure, Ahmed Body dataset versus \(C\) .

# Hidden Channels - \(C\) Relative \(L_2\) norm - \(\phi\) (Ahmed Body)
5.0 11.8
10.0 11.7
15.0 11.4
20.0 10.9
25.0 10.8

(c) Relative \(L_2\) norm on velocity, ShapeNet Car dataset versus \(C\) . \(\{\alpha, \beta\} = \{5, 1\}\)

# Hidden Channels - \(C\) Relative \(L_2\) norm - \(\psi\) (ShapeNet Car)
5.0 6.2
10.0 5.6
15.0 5.5
20.0 4.8
25.0 5.1

(d) Relative \(L_2\) norm on velocity, ShapeNet Car dataset versus \(C\) . \(\{\alpha, \beta\} = \{1, 50\}\)

# Hidden Channels - \(C\) Relative \(L_2\) norm - \(\psi\) (ShapeNet Car)
5.0 5.8
10.0 4.6
15.0 4.2
20.0 3.8
25.0 3.5

Four line plots showing the relative L2 norm on pressure and velocity for ShapeNet Car and Ahmed Body datasets as a function of the number of hidden channels C.

Fig. 5.15 Ablation on the number of hidden channels \(C\) .

scalability with respect to \(F\) . Note, in Fig. 5.16a, a lower absolute minimum for relative \(L_2\) norm over \(\phi\) for the ShapeNet Car dataset of 8.25% with \(F = 4\) .

c24c5dfcbda80eb6b9d019d5dc04faf2_img.jpg

(a) Relative \(L_2\) norm on pressure, ShapeNet Car dataset versus \(F\) .

# Number of FNO blocks - F Relative \(L_2\) norm - \(\phi\) (ShapeNet Car)
1.0 9.6
2.0 9.5
3.0 9.3
4.0 8.25

(b) Relative \(L_2\) norm on pressure, Ahmed Body dataset versus \(F\) . \(\{\alpha, \beta\} = \{5, 1\}\) .

# Number of FNO blocks - F Relative \(L_2\) norm - \(\phi\) (Ahmed Body)
1.0 11.5
2.0 11.4
3.0 11.2
4.0 11.1

(c) Relative \(L_2\) norm on velocity, ShapeNet Car dataset versus \(F\) .

# Number of FNO blocks - F Relative \(L_2\) norm - \(v\) (ShapeNet Car)
1.0 2.8
2.0 5.0
3.0 1.5
4.0 0.0

(d) Relative \(L_2\) norm on pressure, Ahmed Body dataset versus \(F\) . \(\{\alpha, \beta\} = \{1, 50\}\) .

# Number of FNO blocks - F Relative \(L_2\) norm - \(\phi\) (ShapeNet Car)
1.0 4.7
2.0 2.2
3.0 2.1
4.0 0.0

Figure 5.16: Ablation on the number of FNO blocks F. Four line plots showing relative L2 norm vs. number of FNO blocks F (1.0, 2.0, 3.0, 4.0). (a) Relative L2 norm on pressure, ShapeNet Car dataset. (b) Relative L2 norm on pressure, Ahmed Body dataset with {alpha, beta} = {5, 1}. (c) Relative L2 norm on velocity, ShapeNet Car dataset. (d) Relative L2 norm on pressure, Ahmed Body dataset with {alpha, beta} = {1, 50}. In all cases, the relative L2 norm generally decreases as F increases, with a peak at F=2.0 in (c).

Fig. 5.16 Ablation on the number of FNO blocks \(F\) .

We then studied the impact of the number of Fourier modes \(m\) of the FNO. We tested \(m = \{3, 6, 8, 10, 12, 14\}\) by keeping \(F = 2\) and \(C = 20\) . In this case results are less uniform across the four test cases, but we can conclude that a larger number of modes often corresponds to similar if not worse performances, as already pointed out in [17].

The effect that these ablations have on the number of model parameters and size are reported in Fig. 5.18. Note how the ablations on \(M\) are missing since they do not affect the model dimension. The model size scales exponentially with respect to \(C, m\) , and linearly with respect to \(F\) . Fourier modes have the biggest impact on the model parameters, with \(m = 14\) corresponding to a \(\times 13\) increase with respect to the first Fengbo configuration presented, without benefiting the test error. The number of blocks \(F\) , on the other hand, corresponds to a relatively milder increase in the model size while still providing a substantial improvement in performance. This shows how a

c3afdf6d44fa08949ae7df7bd616453f_img.jpg

# Number Fourier modes - \(m\) Relative \(L_2\) norm - \(\phi\) (ShapeNet Car)
4 9.32
6 9.25
8 9.6
10 9.4
12 9.3
14 9.8

Line graph (a) showing Relative L2 norm on pressure for ShapeNet Car dataset versus number of Fourier modes m. The y-axis ranges from 9.3 to 9.8, and the x-axis ranges from 4 to 14. The data points are approximately: (4, 9.32), (6, 9.25), (8, 9.6), (10, 9.4), (12, 9.3), (14, 9.8).

(a) Relative \(L_2\) norm on pressure, ShapeNet Car dataset versus \(m\) .

89f8b2cdc98620285d02b02e6a273e94_img.jpg

# Number Fourier modes - \(m\) Relative \(L_2\) norm - \(\phi\) (Ahmed Body)
4 11.4
6 10.8
8 10.9
10 11.2
12 10.7
14 10.9

Line graph (b) showing Relative L2 norm on pressure for Ahmed Body dataset versus number of Fourier modes m. The y-axis ranges from 10.7 to 11.4, and the x-axis ranges from 4 to 14. The data points are approximately: (4, 11.4), (6, 10.8), (8, 10.9), (10, 11.2), (12, 10.7), (14, 10.9).

(b) Relative \(L_2\) norm on pressure, Ahmed Body dataset versus \(m\) .

ce24ea4de63aaab49f5e0a40d111ea42_img.jpg

# Number Fourier modes - \(m\) Relative \(L_2\) norm - \(\psi\) (ShapeNet Car)
4 5.18
6 5.05
8 5.12
10 5.0
12 4.9
14 5.15

Line graph (c) showing Relative L2 norm on velocity for ShapeNet Car dataset versus number of Fourier modes m. The y-axis ranges from 4.90 to 5.20, and the x-axis ranges from 4 to 14. The data points are approximately: (4, 5.18), (6, 5.05), (8, 5.12), (10, 5.0), (12, 4.9), (14, 5.15).

(c) Relative \(L_2\) norm on velocity, ShapeNet Car dataset versus \(m\) . \(\{\alpha, \beta\} = \{5, 1\}\) .

eb069374425cf7d9138be718c1309fc4_img.jpg

# Number Fourier modes - \(m\) Relative \(L_2\) norm - \(\psi\) (ShapeNet Car)
4 3.92
6 3.85
8 4.08
10 3.88
12 3.82
14 4.08

Line graph (d) showing Relative L2 norm on pressure for Ahmed Body dataset versus number of Fourier modes m. The y-axis ranges from 3.85 to 4.10, and the x-axis ranges from 4 to 14. The data points are approximately: (4, 3.92), (6, 3.85), (8, 4.08), (10, 3.88), (12, 3.82), (14, 4.08).

(d) Relative \(L_2\) norm on pressure, Ahmed Body dataset versus \(F\) . \(\{\alpha, \beta\} = \{1, 50\}\) .

Fig. 5.17 Ablation on the number of Fourier modes \(m\) .

deeper network can likely correspond to more robust and accurate predictions than those shown in Table 5.2.

0f125758037d96e776fac950aeae5c53_img.jpg

Figure 5.18: Number of parameters (M) and model size (MB) as a function of C, R, m. (a) Ablation on C: Number of hidden channels (C) from 5.0 to 25.0. (b) Ablation on F: Number of FNO blocks (F) from 1.0 to 4.0. (c) Ablation on m: Number of Fourier modes (m) from 4 to 14. All plots show that the number of parameters (M) increases linearly with the respective variable, while the model size (MB) increases much more slowly.

Fig. 5.18 Number of parameters (M) and model size (MB) as a function of \(C, R, m\) .

Lastly, the impact of the weighting coefficients of the loss function on Fengbo’s accuracy is shown in Table 5.8. We fix the resolution to \(M = 80\) and vary the weight attributed to different physical quantities in the loss function. \(\beta\) weights velocity for the ShapeNet Car dataset and the Reynolds number for the Ahmed Body dataset, while \(\alpha\) weights the pressure in both. Note how the Reynolds number does not contribute in a significant way to the estimation of pressure in the Ahmed Body dataset. From Table 5.8 we can conclude that the high accuracy accuracy of Fengbo stems also due to the joint estimation of variables, e.g regressing also on \(\psi\) can better constraint the values of \(\phi\) can assume and vice versa. \(\alpha, \beta\) can be thought as two parameters that mix quantities to be regressed and that can be tuned based on the specific requirements of problem to be tackled, e.g. which quantity we wish to prioritise.

Table 5.8 Ablation on the impact of \(\alpha, \beta\) . Reporting the relative \(L_2\) norm.

\(\alpha\) \(\beta\) ShapeNet Car Ahmed Body
Pressure Velocity Pressure
training testing training testing training testing
1 0 8.53 9.21 - - 8.60 11.8
1 1 9.07 9.32 7.23 4.39 8.00 10.7
2 1 8.03 9.30 6.40 4.56 8.23 10.9
5 1 6.94 8.86 5.56 5.10 7.64 10.9
10 1 5.38 9.12 4.28 5.48 9.31 11.9
0 1 - - 4.90 4.03 - -
1 2 9.37 9.50 4.09 4.12 8.42 11.9
1 5 7.71 9.83 3.98 3.82 8.34 11.8
1 10 9.37 10.1 3.82 3.60 8.26 11.8
1 20 10.5 10.8 3.37 3.59 - -
1 50 8.42 11.4 3.23 3.47 - -

5.4.7 Notes on computational complexity

We focused primarily on Neural Operators over Transformers due to the latter's significantly higher computational complexity, namely \(\mathcal{O}(N^2)\) . Transformers, albeit offering improved performances, introduce substantial challenges in terms of resource requirements and scalability. As a proof of that, we offer an analysis of the theoretical complexity of the Fengbo model and compare it with the current state-of-the-art in Transformer-based solvers, the Transolver architecture [188], which to the best of our knowledge is both the most accurate and the least computationally expensive Transformer architecture designed to solve PDEs. Albeit mostly validated over 2D problems, the Transolver has also been tested over one 3D dataset, namely ShapeNet Car.

More specifically, authors of [188] designed an ad hoc attention layer, the Physics-Attention layer, that operates on slices of the PDE domain \(\Omega_d\) . Reported computational complexity of such layer is \(\mathcal{O}(NSC + S^2C)\) , in which \(N\) is the number of mesh nodes, \(S\) is the number of slices into which the domain is partitioned and \(C\) is the number of hidden channels of the model. The authors claim a quasi-linear complexity with respect to \(N\) . However, the overall complexity is heavily dependent on the choice of the model parameters \(S\) and \(C\) , and for large values of \(S\) and \(C\) , which is the setting for most of the experiments in [188], it becomes sub-quadratic. We compare the model complexities for the 3D and the 2D case.

3D Case. Model complexities for the 3D case are shown in Fig. 5.19. For the ShapeNet Car, specifically, the reported parameters are as follows: \(N \simeq 32000\) mesh nodes, \(S = 32\) slices into which the car surface is partitioned and \(C = 256\) channels. This places the computational complexity of the Transolver architecture for the ShapeNet Car dataset at the green marker shown in Fig. 5.19a.

On the other hand, as shown in Table 5.4, we report a computational complexity of \(\mathcal{O}(N \log N)\) for the Fengbo pipeline. In our approach, \(N\) is the dimension of the 3D regular grid, hence \(N = M^3\) , with \(M\) being the grid resolution. This places the computational complexity of the Fengbo architecture for the ShapeNet Car dataset at the yellow marker shown in Fig. 5.19a.

Such design choices yielded results shown in Fig. 5.19b: the Transolver model attains a 1.4% decrease relative \(L_2\) error over Fengbo, but at a computational complexity of two orders of magnitude larger. It is also worth noting that the experiment setting for Transolver followed the implementation of 3D-GeoCA [42], which takes 789 samples for training and 100 samples for testing. On the other hand, we followed the approach of GINO [112], in which we retain 611 watertight meshes and employ 500 samples for

training and 100 for testing, meaning that the Transolver, besides being computationally more complex, was also trained on 57.8% more samples.

57589d1e3e86dfdc4cda1c34f7dafb92_img.jpg

Figure 5.19(a): A log-log plot titled 'Plot of N log N and NSC + S^2 C for various values of S'. The x-axis is labeled 'N' and ranges from 10^4 to 10^7. The y-axis is labeled 'Function Value' and ranges from 10^4 to 10^14. The plot shows several lines: a green line for N log N, a blue line for N^2, and several orange and brown lines for NSC + S^2 C with different values of S (8, 32, 64, 128, 256). Markers indicate the operating points for Fengbo (yellow star) and Transolver (pink star) for the ShapeNet Car dataset. Vertical dashed lines are drawn at N ≈ 3.5e4, 1e5, and 3.5e5.

(a) Computational complexities of Fengbo ( \(N \log N\) ) and of Transolver ( \(NSC + S^2 C\) ), for different values of \(S\) . Fengbo operates at \(N_1\) , while the Transolver operates at \(N_2\) in its 3D setting. Linear and quadratic complexity curves added for reference.

Fengbo Transolver
Compl. \(\mathcal{O}(10^6)\) \(\mathcal{O}(10^8)\)
\(L_2\) ( \(\phi\) ) 8.86 7.45
\(L_2\) ( \(\psi\) ) 3.47 2.07

(b) Comparison of Fengbo and Transolver in terms of computational complexity (yellow and green marker in the figure, respectively) and test \(L_2\) norm for the pressure and velocity field over the ShapeNet Car dataset. Pressure \(L_2\) obtained with \(\{\alpha, \beta\} = \{5, 1\}\) , Velocity \(L_2\) obtained with \(\{\alpha, \beta\} = \{1, 50\}\)

Fig. 5.19 Computational complexity comparison of Fengbo and Transolver [188] for the 3D case.

A similar claim can be made for the Ahmed Body dataset, not analysed in [188], but whose complexity can still be studied. Assuming one point per mesh, i.e. \(N = 100000\) , and the same parameter configuration employed for the ShapeNet Car dataset, i.e. \(\{S, C\} = \{32, 256\}\) , the resulting complexity of the model also reaches \(\mathcal{O}(10^8)\) , corresponding to the pink marker in Fig. 5.19a. With Fengbo, the grid resolution is kept unchanged for the Ahmed Body dataset, which yields identical complexity to the ShapeNet Car dataset, demonstrating its robustness to larger mesh size.

Table 5.9 Complexity comparison for different datasets

Darcy Elasticity Plasticity AirFoil Navier Stokes Pipes ShapeNet Car, Ahmed Body (Fengbo)
\(N\) 7225 976 3131 11271 16641 4096 \(80^3\)
\(S\) 64 64 64 64 64 32 -
\(C\) 128 128 128 128 128 256 -
Complexity \(\mathcal{O}(10^6)\) \(\mathcal{O}(10^7)\) \(\mathcal{O}(10^8)\) \(\mathcal{O}(10^8)\) \(\mathcal{O}(10^8)\) \(\mathcal{O}(10^8)\) \(\mathcal{O}(10^6)\)

a756b11a82d229c04c8c5d78f40ff16d_img.jpg

Figure 5.20: Computational Complexity versus L2 promotion for different datasets. The plot shows 'Promotion' on the y-axis (log scale from 10^0 to 2x10^0) versus 'Computational Complexity' on the x-axis (log scale from 10^6 to 10^9). Data points for various datasets are plotted, and curves represent interpolations. A red arrow and ellipse highlight the 'Fengbo' ablations, which are clustered at lower complexity and promotion values. A black arrow and ellipse highlight the 'Transolver' ablations, which are clustered at higher complexity and promotion values. The legend identifies the datasets: Darcy - Transolver, Elasticity - Transolver, Plasticity - Transolver, Airfoil - Transolver, Navier-Stokes - Transolver, Pipes - Transolver, ShapeNet Car, P - Fengbo, ShapeNet Car, V - Fengbo, Ahmed Body, P - Fengbo, and Ahmed Body, V - Fengbo.

Fig. 5.20 Relative \(L_2\) norm promotion versus computational complexity for Fengbo (red arrow and ellipse) and Transolver's (black arrow and ellipse) ablations.

2D Case. We compare the ablations presented in this Section with those presented in Appendix C of [188]. We compute the corresponding Fengbo complexity as the grid size varies, i.e. \(M = \{40, 50, 60, 70, 80\}\) and the corresponding Transolver complexity as \(N\) and \(C\) vary across datasets and as the number of slices \(S\) vary employed across ablations, namely \(S = \{1, 8, 16, 32, 64, 96, 128, 256, 512, 1024\}\) . This is summarised in Table 5.9. We then plot the relative \(L_2\) norm promotion versus the resulting complexities for six 2D datasets (Darcy, Elasticity, Plasticity, Airfoil, Navier-Stokes and Pipes) and two 3D datasets (ShapeNet Car and Ahmed Body). The promotion is defined as \(\mathcal{P}_i = \mathcal{L}_i / \mathcal{L}^*\) , with \(\mathcal{L}_i\) the relative \(L_2\) norm reported for the \(i\) th ablation and \(\mathcal{L}^*\) the overall minimum \(L_2\) norm reported. We do so as different datasets might present very different ranges of \(\mathcal{L}\) .

This is shown in Fig. 5.20, in which the markers represent different ablations and the curves are the resulting interpolations. Note how, even when compared to 2D datasets, Fengbo still operates at one order of magnitude below Transolver for the Elasticity dataset and at two orders of magnitude for the remaining five datasets. On

top of that, the computational cost to lower \(\mathcal{L}\) to the optimal value is also significantly lower for the Fengbo pipeline. Additionally, if Fengbo were to be tested in a 2D scenario, the value of \(N\) would likely be much smaller than \(10^6\) .

Part II: Electromagnetism

5.5 STAResNet: working in STA to solve Maxwell's PDEs

The question we address in this Section is: how does the choice of the algebra in which data is embedded affect the accuracy of the PDE solution via a GA network? We focus on Maxwell's PDEs, which describe the fundamental behavior of electric and magnetic fields, and compare their solutions using two approaches: one in \(n\) D GA and one in \((n+1)\) D Spacetime Algebra (STA).

By comparing these two approaches, we aim to understand the impact of the algebraic framework on the accuracy and efficiency of the solutions. The choice of algebra can significantly influence the complexity of the problem formulation and the performance of the neural network, potentially leading to more accurate and efficient PDE solvers.

5.5.1 Problem definition

Maxwell's equations describe the behaviour of electric and magnetic fields in classical electromagnetism. They read as follows (for the vacuum case):

\[\nabla \cdot \mathbf{E} = \frac{\rho}{\varepsilon_0} \quad (5.25)\]

known as Gauss's Law for Electricity and states that the divergence of the electric field ( \(\mathbf{E}\) ) is equal to the charge density ( \(\rho\) ) divided by the permittivity of free space ( \(\varepsilon_0\) ).

\[\nabla \cdot \mathbf{B} = 0 \quad (5.26)\]

representing Gauss's Law for Magnetism and indicating that the divergence of the magnetic field ( \(\mathbf{B}\) ) is zero, implying the absence of magnetic monopoles.

\[\nabla \times \mathbf{E} + \frac{\partial \mathbf{B}}{\partial t} = 0 \quad (5.27)\]

i.e. Faraday's Law of Induction, stating that the curl of the electric field ( \(\mathbf{E}\) ) is equal to the negative rate of change of the magnetic field ( \(\mathbf{B}\) ) with respect to time.

\[\nabla \times \mathbf{B} - \mu_0 \varepsilon_0 \frac{\partial \mathbf{E}}{\partial t} = \mu_0 \mathbf{J} \quad (5.28)\]

which is the Ampère's Law, stating that the curl of the magnetic field (B) is equal to the sum of the current density (J) and the product of the permeability of free space ( \(\mu_0\) ) and the rate of change of the electric field (E) with respect to time. To simplify the exposition we will work in natural units ( \(c = \epsilon_0 = \mu_0 = 1\) ), so that Maxwell's equations become:

\[\nabla \cdot \mathbf{E} = \rho \quad \nabla \cdot \mathbf{B} = 0 \quad \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} \quad \nabla \times \mathbf{B} = \frac{\partial \mathbf{E}}{\partial t} + \mathbf{J} \quad (5.29)\]

Maxwell's equations in \(\mathcal{G}_{3,0,0}\)

This scenario will be our baseline, and it is equivalent to that proposed in [17], using Euclidean 3-space. Since this approach has already been shown to outperform a real-valued ResNet, we will largely omit direct comparisons with it, except in Figs. 5.24 and 5.28.

For a full derivation, we refer the reader to [131]. \(\mathcal{G}(3,0,0)\) is the three-dimensional Geometric Algebra spanned by the three basis vectors \(\{e_1, e_2, e_3\}\) . By defining the pseudoscalar \(i = e_{123}\) we can introduce the electromagnetic (EM) field multivector F:

\[\mathbf{F} = \mathbf{E} + i\mathbf{B} = E_1e_1 + E_2e_2 + E_3e_3 + B_1e_{23} + B_2e_{13} + B_3e_{12}. \quad (5.30)\]

With this object it is then possible to write Eq. 5.25-5.28 into a set of two equations only:

\[\nabla \cdot \mathbf{F} = \rho \quad \nabla \times \mathbf{F} = i \left( \frac{\partial \mathbf{F}}{\partial t} + \mathbf{J} \right) \quad (5.31)\]

which, using the geometric product, can be further reduced to:

\[\left( \frac{\partial}{\partial t} + i\nabla \right) \mathbf{F} = \mathbf{J} - i\rho \quad (5.32)\]

In this scenario, we formulate the PDE solution as a 3D multivector-to-multivector regression problem. The inputs to the network will be a pair of multivectors sampled at two consecutive time instants \(\{\mathbf{F}_i, \mathbf{F}_{i+\Delta t}\}\) , while the label will be the multivector after a history of 2 time steps, i.e. \(\mathbf{F}_{i+2\Delta t}\) . This is summarized in Fig. 5.21.

Maxwell's equations in \(\mathcal{G}_{1,3,0}\)

However, the previous section employed an embedding with limited physical significance. The Spacetime Algebra (STA) framework provides a mathematically natural

decc27040435a188fd6734a9002e5312_img.jpg

time

\(t_0 \quad t_0 + \Delta t \qquad \qquad \qquad t_0 + 2\Delta t\)

\(E_1\) \(E_2\) \(E_3\) \(B_1\) \(B_2\) \(B_3\)

\(e_1\) \(e_2\) \(e_3\) \(e_{23}\) \(e_{13}\) \(e_{12}\)

\(\mathbf{F} = \mathbf{E} + \mathbf{iB}\)
vector + bivector
\(G(3,0,0)\)

Clifford ResNet
\(G(3,0,0)\)

\(E_1\) \(E_2\) \(E_3\) \(B_1\) \(B_2\) \(B_3\)

Diagram illustrating the 3D GA approach for solving Maxwell's PDEs using Clifford ResNet. The input consists of two groups of fields at time t_0 and t_0 + \Delta t. The first group (E_1, E_2, E_3) is associated with basis vectors e_1, e_2, e_3. The second group (B_1, B_2, B_3) is associated with bivectors e_{23}, e_{13}, e_{12}. These are combined into a vector + bivector F = E + iB in G(3,0,0). This is processed by a Clifford ResNet block G(3,0,0) over time. The output shows the fields at time t_0 + 2\Delta t, with the same basis vectors and bivectors.

Fig. 5.21 3D GA approach: solving Maxwell's PDEs through Clifford ResNet, a ResNet-inspired network working in \(\mathcal{G}_{3,0,0}\) .

representation for Maxwell's equations. STA is formally denoted as \(G(1, 3, 0)\) , generated by four basis vectors \(\{\gamma_\mu\}_{\mu=0}^3\) that satisfy the defining relations \(\gamma_0^2 = 1\) , \(\gamma_k^2 = -1\) for \(k \in \{1, 2, 3\}\) , and \(\gamma_\mu \gamma_\nu = -\gamma_\nu \gamma_\mu\) for \(\mu \neq \nu\) .

In this framework, the electric and magnetic fields are represented as spatial bivectors:

\[\mathbf{E} = E_1 \sigma_1 + E_2 \sigma_2 + E_3 \sigma_3 \quad (5.33)\]

\[\mathbf{B} = B_1 \sigma_1 + B_2 \sigma_2 + B_3 \sigma_3 \quad (5.34)\]

where \(\sigma_k = \gamma_k \gamma_0\) are the spacetime bivector basis elements. Note that while our electric and magnetic fields have the same coefficients as in the previous section, here they are bivectors rather than vectors. The spacetime gradient operator \(\nabla\) is defined as:

\[\nabla = \gamma^\mu \frac{\partial}{\partial x^\mu} = \gamma^0 \frac{\partial}{\partial t} + \gamma^i \frac{\partial}{\partial x^i} \quad (5.35)\]

where summation over repeated indices is implied, with \(\mu \in \{0, 1, 2, 3\}\) and \(i \in \{1, 2, 3\}\) . This operator enables the formulation of Maxwell's equations as a single geometric equation:

\[\nabla F = J \quad (5.36)\]

where the Faraday bivector \(F = \mathbf{E} + I\mathbf{B}\) encodes both electric and magnetic fields, with \(I = \gamma_0\gamma_1\gamma_2\gamma_3\) being the pseudoscalar unit, and \(J = (\rho - \mathbf{J})\gamma_0\) represents the four-current.

The compact form in Eq. 5.36 is not merely notational convenience compared to Eq.5.25-5.28; it expresses a fundamental geometric relationship through the invertible geometric product, facilitating analytical solutions for electromagnetic propagation in conducting media.

A principal advantage of the STA formulation is its covariance under Lorentz transformations. Within this framework, a proper orthochronous Lorentz transformation is represented by an STA rotor \(R\) satisfying \(R\tilde{R} = 1\) , where \(\tilde{R}\) denotes the reversion operation. Under such a transformation, geometric objects transform covariantly through the adjoint action:

\[F \mapsto F' = RFR^{-1} = RF\tilde{R} \quad (5.37)\]

This transformation law preserves the algebraic structure of Maxwell's equation since:

\[\nabla'F' = R(\nabla F)\tilde{R} = RJ\tilde{R} = J' \quad (5.38)\]

where \(\nabla'\) and \(J'\) are the transformed gradient and current, respectively. This demonstrates that the equation \(\nabla F = J\) remains form-invariant across all inertial reference frames. For more details relating to Eq. 5.38, including how \(\nabla\) transforms, see [43].

The STA formulation reveals the Lorentz invariants of the electromagnetic field as intrinsic geometric quantities. The scalar and pseudoscalar parts of \(F^2\) correspond to the field invariants:

\[F^2 = (\mathbf{E} + I\mathbf{B})^2 = \mathbf{E}^2 - \mathbf{B}^2 + 2(\mathbf{E} \cdot \mathbf{B})I \quad (5.39)\]

where \(\mathbf{E}^2 - \mathbf{B}^2\) is the scalar invariant and \(\mathbf{E} \cdot \mathbf{B}\) is the pseudoscalar invariant. Note that the scalar and pseudoscalar invariants are the same for both \(G(3, 0, 0)\) and \(G(1, 3, 0)\) .

This formulation shows that electromagnetism is fundamentally a relativistic field theory, where electric and magnetic fields are frame-dependent projections of a single covariant entity in spacetime. The STA approach thus provides not only computational efficiency but also deeper physical insight into the geometric structure of electromagnetic phenomena.

We hence wish to show that a neural network for predicting EM fields working in the STA performs better than working in a less natural embedding in \(G(3,0,0)\) : to do this we will keep data and architectures identical. In the STA scenario, we

formulate the PDEs solution as a spacetime bivector-to-bivector regression problem. The spacetime bivector has form:

\[F = \mathbf{E} + I\mathbf{B} = E_1\gamma_{10} + E_2\gamma_{20} + E_3\gamma_{30} + B_1\gamma_{13} + B_2\gamma_{12} + B_3\gamma_{12}. \quad (5.40)\]

The inputs to the network will be a pair of Faraday bivectors sampled at two consecutive time instants \(\{F_i, F_{i+\Delta t}\}\) , while the label will be the bivector after a history of 2 time steps, i.e. \(F_{i+2\Delta t}\) . The STA approach via STAResNet is summarized in Fig. 5.22. Note how the difference with respect to the approach in 2.2 is exclusively the mathematical framework the network works in.

e33485d4301404e5a43a61e4cfb96fcc_img.jpg

The diagram illustrates the STA approach for solving Maxwell's PDEs using STAResNet. It shows the flow of spacetime bivectors over time steps \(t_0\) , \(t_0 + \Delta t\) , and \(t_0 + 2\Delta t\) . The inputs at \(t_0\) and \(t_0 + \Delta t\) are processed by STAResNet \(G(1,3,0)\) to produce the output at \(t_0 + 2\Delta t\) .

The inputs at \(t_0\) and \(t_0 + \Delta t\) are represented as a pair of Faraday bivectors \(F = \mathbf{E} + I\mathbf{B}\) in the spacetime bivector space \(G(1,3,0)\) . The components of these bivectors are shown as a grid of 12 images, labeled \(E_1, E_2, E_3, B_1, B_2, B_3\) for \(t_0\) and \(\gamma_{10}, \gamma_{20}, \gamma_{30}, \gamma_{23}, \gamma_{13}, \gamma_{12}\) for \(t_0 + \Delta t\) .

The STAResNet \(G(1,3,0)\) processes these inputs to produce the output at \(t_0 + 2\Delta t\) , which is also a Faraday bivector \(F\) with components \(E_1, E_2, E_3, B_1, B_2, B_3\) shown as a grid of 6 images.

Diagram illustrating the STA approach for solving Maxwell's PDEs using STAResNet. The diagram shows the flow of spacetime bivectors over time steps t_0, t_0 + \Delta t, and t_0 + 2\Delta t. Inputs at t_0 and t_0 + \Delta t are processed by STAResNet G(1,3,0) to produce the output at t_0 + 2\Delta t.

Fig. 5.22 STA approach: solving Maxwell's PDEs through STAResNet, our ResNet-inspired network working in \(\mathcal{G}_{1,3,0}\) .

5.5.2 Architecture: a ResNet in STA

We study 2D and 3D Maxwell's PDEs. In the 2D case, we compare the 2D Clifford ResNet and the 2D STAResNet: 2D Clifford ResNet, which handles multivectors \(F\) in \(G(2,0,0)\) , has 20 blocks of 2D convolutions paired with a ReLU activation function. Each convolutional layer has \(C = 32\) channels, except the first one with 2 channels and the last one with 1 channel, corresponding to the number of input and output time steps, respectively.

2D STAResNet, that handles bivectors \(F\) in \(G(1,2,0)\) , shares the same structure with the 2D Clifford ResNet, except for the embedding in a different mathematical space and the number of channels, reduced to \(C = 24\) . This has been done to make

sure both networks have the same number of parameters for a fair comparison. Both networks have slightly below 1M parameters.

In the 3D case, the 3D Clifford ResNet sits in \(G(3,0,0)\) while the 3D STAResNet sits in \(G(1,3,0)\) . The architectures are identical to their 2D counterparts, with the only exceptions that 2D convolutions are replaced with 3D convolutions and the number of channels are reduced from 32 to 11 in Clifford ResNet and from 24 to 8 in STAResNet, for a total of approximately 600,000 parameters for both networks.

The number of channels has been established based on (i) computational resources and (ii) fairness of comparison, in terms of total number of parameters, between the GA and STA approaches.

Training details

Both networks have been trained for 50 epochs with a batch size of 32 in 2D and of 2 in the 3D case. We chose the Adam optimiser with learning rate of \(10^{-3}\) . The objective function to be minimised is the mean squared error (MSE) between ground truth fields \(\mathbf{E}_{i+2\Delta t}, \mathbf{B}_{i+2\Delta t}\) and estimated fields \(\hat{\mathbf{E}}_{i+2\Delta t}, \hat{\mathbf{B}}_{i+2\Delta t}\) , which in 2D is defined as

\[\mathcal{L}_{2D} = \mathcal{L}_{E_x} + \mathcal{L}_{E_y} + \mathcal{L}_{B_z} \quad (5.41)\]

where each component is defined as:

\[\mathcal{L}_{E_x} = \frac{1}{MN} \sum_{m=0}^M \sum_{n=0}^N \left( E_{xmn,i+2\Delta t} - \hat{E}_{xmn,i+2\Delta t} \right)^2 \quad (5.42)\]

\[\mathcal{L}_{E_y} = \frac{1}{MN} \sum_{m=0}^M \sum_{n=0}^N \left( E_{ymn,i+2\Delta t} - \hat{E}_{ymn,i+2\Delta t} \right)^2 \quad (5.43)\]

\[\mathcal{L}_{B_z} = \frac{1}{MN} \sum_{m=0}^M \sum_{n=0}^N \left( B_{zmn,i+2\Delta t} - \hat{B}_{zmn,i+2\Delta t} \right)^2 \quad (5.44)\]

and in which \(m, n\) indicate the spatial location of the field within the 2D surface. In the 3D case, we have

\[\mathcal{L}_{3D} = \frac{1}{LMN} \sum_j \sum_{l=0}^L \sum_{m=0}^M \sum_{n=0}^N (E_{jlmn,i+2\Delta t} - \hat{E}_{jlmn,i+2\Delta t})^2 + (B_{jlmn,i+2\Delta t} - \hat{B}_{jlmn,i+2\Delta t})^2, \quad (5.45)\]

1eba201f682d15a30b207c0d68cfb6d5_img.jpg

Plot (a) showing training and validation losses for 2D Maxwell's PDEs at Δt = 25s. The y-axis is 'Loss' on a log scale from 6e-5 to 1e-5. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(2,0,0) (solid blue), Val G(2,0,0) (dashed blue), Training G(1,2,0) (solid orange), and Val G(1,2,0) (dashed orange). All losses decrease over time, with the orange lines showing more initial fluctuations.

(a) \(\Delta t = 25\text{s}\) b11c2eb778491a9a07084b02124d1faa_img.jpg

Plot (b) showing training and validation losses for 2D Maxwell's PDEs at Δt = 50s. The y-axis is 'Loss' on a log scale from 1e-5. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(2,0,0) (solid blue), Val G(2,0,0) (dashed blue), Training G(1,2,0) (solid orange), and Val G(1,2,0) (dashed orange). The training loss for G(2,0,0) shows a sharp spike around epoch 18.

(b) \(\Delta t = 50\text{s}\) f4643d7847149c41d329ae12f8e2a519_img.jpg

Plot (c) showing training and validation losses for 2D Maxwell's PDEs at Δt = 75s. The y-axis is 'Loss' on a log scale from 2e-5 to 1e-4. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(2,0,0) (solid blue), Val G(2,0,0) (dashed blue), Training G(1,2,0) (solid orange), and Val G(1,2,0) (dashed orange). The losses are relatively stable after an initial drop.

(c) \(\Delta t = 75\text{s}\) a95790215cc1f3478ca3861dfea46575_img.jpg

Plot (d) showing training and validation losses for 2D Maxwell's PDEs at Δt = 100s. The y-axis is 'Loss' on a log scale from 1e-3 to 1e-4. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(2,0,0) (solid blue), Val G(2,0,0) (dashed blue), Training G(1,2,0) (solid orange), and Val G(1,2,0) (dashed orange). The training loss for G(2,0,0) shows a sharp spike around epoch 18, similar to plot (b).

(d) \(\Delta t = 100\text{s}\)

Fig. 5.23 Training and validation losses versus number of epochs for 2D Maxwell's PDEs for instances sampled at (a) 25s, (b) 50s, (c) 75s, (d) 100s.

in which \(l, m, n\) indicate the spatial location of the field within the 3D volume and \(j = \{x, y, z\}\) . Note how this is equivalent to measuring the loss over the real coefficients of ground truth and estimated \(\mathbf{F}, \hat{\mathbf{F}}\) , for the vanilla GA approaches, and the real coefficients of ground truth and estimated \(F, \hat{F}\) , for the STA approach. The MSE has the same units as the squared Faraday bivector, namely volts squared per meter squared ( \(V^2/m^2\) ). This applies uniformly to all figures throughout the manuscript. The code has been implemented via tensorflow 2.13.1 and accelerated with CUDA 12.2 on an NVIDIA GeForce RTX 4090 GPU.

2b56a53620616729fb852d9ac827e8bd_img.jpg

(a) MSE (↓) versus \(\Delta t\) . (b) Correlation (↑) versus \(\Delta t\) .

Figure 5.24: (a) Mean squared error (MSE) and (b) correlation between estimated and ground truth EM fields in the test set for varying Δt. The figure contains two bar charts. Chart (a) shows MSE on the y-axis (ranging from 0.0E+00 to 2.8E-05) against Δt (s) on the x-axis (values: 25, 50, 75, 100). Chart (b) shows Correlation on the y-axis (ranging from 0.85 to 0.98) against Δt (s) on the x-axis (values: 25, 50, 75, 100). Both charts compare three models: STAResNet - G(1,2,0) (pink), Clifford ResNet - G(2,0,0) (blue), and ResNet (yellow). In (a), MSE generally increases with Δt, with STAResNet - G(1,2,0) showing the lowest MSE. In (b), correlation generally decreases with Δt, with STAResNet - G(1,2,0) maintaining the highest correlation.

Fig. 5.24 (a) Mean squared error and (b) correlation between estimated and ground truth EM fields in the test set for varying \(\Delta t\) . We included real-valued ResNet for a more comprehensive comparison.

5.5.3 Experiments in 2D

Impact of sampling period

The first scenario we look at is the solution of the Maxwell PDEs for varying sampling time \(\Delta t\) of the EM field. The datasets have been generated with a Finite-Difference Time-Domain (FDTD) solver following the specifications of [17]. We consider a surface with spatial resolution of \(32 \times 32\) and step size \(\Delta x = \Delta y = 5 \times 10^{-7} \text{m}\) , with the EM field sampled at varying sampling period \(\Delta t = \{25, 50, 75, 100\} \text{s}\) . The light is propagated from 6 point sources randomly placed in the \(xy\) plane. The wavelength of the emitted light is \(\lambda = 10^{-5} \text{m}\) . Each light source emits light with a random phase and random amplitude. We chose open boundary conditions to provide the most physically accurate representation of unbounded space by allowing electromagnetic waves to propagate beyond the computational domain without artificial reflections.

The training set includes 30000 frames coming from 32 different sequences of the EM field. For each sequence we employ the first two samples, i.e. the first two time steps, as input to the networks and the third as target. The validation and test sets are structured similarly, and they include 3000 and 2400 frames, respectively.

The training and validation loss profiles for the four \(\Delta t\) cases are shown in Fig. 5.23. STAResNet consistently achieves lower loss both at both training and validation stages compared to its 2D GA counterpart.

We measure two quantities at testing stage, the MSE as defined in Eq. 5.41 and the correlation index, defined in 2D as

\[r_{2D} = \frac{1}{MN} \sum_{m=0}^M \sum_{n=0}^N T_{mn} \hat{T}_{mn} \quad (5.46)\]

where \(T = [E_{x,i+2\Delta t}, E_{y,i+2\Delta t}, B_{z,i+2\Delta t}]\) is the ground truth EM fields, \(\hat{T} = [\hat{E}_{x,i+2\Delta t}, \hat{E}_{y,i+2\Delta t}, \hat{B}_{z,i+2\Delta t}]\) the estimated EM fields and \(T_{mn}\) indicates the fields evaluated in position \((m, n)\) on the 2D grid. Results are summarised in Fig. 5.24. As expected, the MSE increases as more time passes between successive frames. Correlation increases as for large \(\Delta t\) the fields tend to dissipate, hence presenting less busy patterns that result in higher correlation despite a worse estimation of the fields. STAResNet consistently performs better regardless of the sampling period \(\Delta t\) . As already demonstrated in [17], a GA-embedded architecture consistently outperform its real-valued counterpart.

In all the cases presented, STAResNet does a better job at estimating the interference patterns arising from multiple sources as opposed to the Clifford ResNet. Additionally, STA has the advantage of unifying the fields components into a single object, the Faraday bivector. We can also choose to visualise the scalar part of the magnitude of the Faraday bivector defined as

\[\mathbf{F}^2 = (E_x \gamma_{10} + E_y \gamma_{20} + B_z \gamma_{12})^2 \quad (5.47)\]

Note how, due to the STA signature, the magnitude of \(\mathbf{F}^2\) , which is what we will plot here, can be negative. Examples of the magnitude of \(\mathbf{F}^2\) plots for each tested \(\Delta t\) are shown in Fig. 5.25 As expected, as \(\Delta t\) increases the PDE solutions become less accurate, but in each of the four cases STAResNet yields magnitude plots with more similar patterns to GT as opposed to Clifford ResNet working in 2D GA.

4c4a5bb88d303bdd784fcae1ddb08539_img.jpg

The figure displays a comparison of three methods for solving Maxwell's PDEs over a \(32 \times 32\) spatial grid. The methods are Ground Truth (GT), STAResNet \(G(1,2,0)\) , and Clifford ResNet \(G(2,0,0)\) . The results are shown for four different time steps: \(\Delta t = 25\) s, \(\Delta t = 50\) s, \(\Delta t = 75\) s, and \(\Delta t = 100\) s. The color bar on the right indicates the magnitude of the values, ranging from -0.00010 (dark red) to 0.00004 (dark blue).

GT STAResNet \(G(1,2,0)\) Clifford ResNet \(G(2,0,0)\)
\(\Delta t = 25\) s
\(\Delta t = 50\) s
\(\Delta t = 75\) s
\(\Delta t = 100\) s

A grid of 12 heatmaps comparing Ground Truth (GT), STAResNet G(1,2,0), and Clifford ResNet G(2,0,0) for F^2 over a 32x32 spatial grid at four time steps: Δt = 25 s, 50 s, 75 s, and 100 s. A color bar on the right shows values from -0.00010 to 0.00004. Heatmap for GT at Δt = 25 s Heatmap for STAResNet G(1,2,0) at Δt = 25 s Heatmap for Clifford ResNet G(2,0,0) at Δt = 25 s Heatmap for GT at Δt = 50 s Heatmap for STAResNet G(1,2,0) at Δt = 50 s Heatmap for Clifford ResNet G(2,0,0) at Δt = 50 s Heatmap for GT at Δt = 75 s Heatmap for STAResNet G(1,2,0) at Δt = 75 s Heatmap for Clifford ResNet G(2,0,0) at Δt = 75 s Heatmap for GT at Δt = 100 s Heatmap for STAResNet G(1,2,0) at Δt = 100 s Heatmap for Clifford ResNet G(2,0,0) at Δt = 100 s

Fig. 5.25 Ground truth (left column) shown against \(F^2\) (right column) and \(F^2\) (centre column) over the test set for varying \(\Delta t\) . Values are shown on a \(32 \times 32\) spatial grid.

Impact of obstacles

We make the PDE solution more challenging by increasing the surface size to \(48 \times 48\) , and adding a single obstacle of varying dimension and fixed permittivity \(\epsilon = 1.7^2\) . The step-size has been kept to \(\Delta x = \Delta y = 5 \times 10^{-7} \text{ m}\) and the sampling period to \(\Delta t = 25 \text{ s}\) . Light sources follow the same specifications of Subsection 5.1.

7e48dc63fbe634852d4e652a27368859_img.jpg

Figure 5.26: Five different obstacle configurations on a 48x48 grid. The configurations are: 1) a vertical black rectangle on the left; 2) a small black square in the top-left; 3) a thin vertical black line in the center; 4) a thin horizontal black line in the bottom-right; 5) a small black square in the center. The last three configurations (3, 4, and 5) are highlighted with a green border, indicating they are unseen geometries.

Fig. 5.26 The 5 different obstacle configurations. The 3 unseen geometries are highlighted.

The training set includes 30000 frames, coming from 32 sequences in time of the EM field, with five different obstacle configurations appearing with the same probability (i.e. 6000 frames per configuration). The validation includes 12800 sequences with the same five obstacle configurations (i.e. 2560 frames per configuration) but with different light sources with respect to the training set. We employed two different test sets, with 12800 sequences each. The first test set presents obstacles in the same position as the training set but with different light sources. The second test set is generated with three obstacles in different locations to those in the training set, to assess the network's ability to generalise to previously unseen geometries (see Fig. 5.26). To make the comparison even more challenging, we reduced the number of channels of STAResNet from 24 to 23, for a total of 866326 trainable parameters against the 928580 parameters of Clifford ResNet.

Training and validation losses for Clifford ResNet and STAResNet are shown in Fig. 5.27. The trend is the same as that shown in Fig. 5.23: again, in the presence of obstacles with fixed sampling period, STAResNet attains a lower error. Moreover, STAResNet achieves lower MSE and higher correlation coefficient as compared to 2D Clifford ResNet and real-valued ResNet regardless if testing on seen or unseen obstacle configuration (see Fig. 5.28).

Examples of the squared magnitude of \(\mathbf{F}\) estimated in the presence of obstacles are given in Fig. 5.29: also in this scenario, STAResNet better preserves the finer details in the Faraday bivector as opposed to a vanilla GA approach.

2fcccfdf70b3cc337b3f5a399baf18dc_img.jpg

Line graph showing training and validation losses versus number of epochs for the PDE in the presence of obstacles. The y-axis is 'Loss' on a logarithmic scale from 10^-6 to 10^-5. The x-axis is 'Epochs' from 0 to 45. Four lines are plotted: Training G(2,0,0) (solid blue), Val G(2,0,0) (dashed blue), Training G(1,2,0) (solid orange), and Val G(1,2,0) (dashed orange). All lines show a general downward trend, with the orange lines (G(1,2,0)) achieving lower losses than the blue lines (G(2,0,0)).

Fig. 5.27 Training and validation losses versus number of epochs for the PDE in the presence of obstacles, for a total of 5 different obstacles configuration during the training phase.

7c7481493be4ee2061f848f96118b51e_img.jpg

Model Seen MSE (↓) Unseen MSE (↓) Seen Correlation (↑) Unseen Correlation (↑)
STAResNet - G(1,2,0) ~1.8E-07 ~2.5E-07 ~0.98 ~0.98
Clifford ResNet - G(2,0,0) ~1.1E-06 ~1.3E-06 ~0.92 ~0.92
ResNet ~1.5E-06 ~1.8E-06 ~0.90 ~0.88

Two bar charts comparing STAResNet - G(1,2,0) (pink), Clifford ResNet - G(2,0,0) (blue), and ResNet (yellow) across 'Seen' and 'Unseen' obstacle configurations. (a) MSE (↓): STAResNet has the lowest MSE in both cases (~1.8E-07 for Seen, ~2.5E-07 for Unseen). (b) Correlation (↑): STAResNet has the highest correlation in both cases (~0.98 for Seen, ~0.98 for Unseen).

Fig. 5.28 (a) Mean squared error and (b) correlation between estimated and ground truth fields over the two test sets with seen and unseen obstacle configurations.

7ae6f264856fa21a0ff59641887ba259_img.jpg

Figure 5.29 displays a comparison of Ground Truth (GT) and two neural network models (STAResNet \(G(1,2,0)\) and Clifford ResNet \(G(2,0,0)\) ) over a test set. The figure is organized into a grid of heatmaps. The columns are labeled: GT, STAResNet \(G(1,2,0)\) , and Clifford ResNet \(G(2,0,0)\) . The rows are labeled: previously seen geometries (top two rows) and unseen geometries (bottom two rows). Each heatmap shows a 48x48 spatial grid. A color bar on the right indicates the magnitude of the values, ranging from -0.00010 (dark red) to 0.00004 (dark blue).

Figure 5.29: A 4x3 grid of heatmaps comparing Ground Truth (GT), STAResNet G(1,2,0), and Clifford ResNet G(2,0,0) over 48x48 spatial grids. The rows are labeled 'previously seen geometries' (top two) and 'unseen geometries' (bottom two). A color bar on the right shows values from -0.00010 (red) to 0.00004 (blue).

Fig. 5.29 Ground truth (left column) shown against \(F^2\) (right column) and \(F^2\) (centre column) over the test set with seen obstacles configurations (number 5 of Fig. 5.26) shown in top 2 rows and unseen obstacles configurations (number 7 of Fig. 5.26), shown in bottom 2 rows. Values are shown on a \(48 \times 48\) spatial grid for one chosen timepoint.

Impact of number of parameters

STAResNet works in a mathematical space with one additional dimension compared to Clifford ResNet in 2D GA. This means that the tensor embedded in \(G(1,2,0)\) in STAResNet will have its extra dimension, deriving from the GA embedding, of size \(2^3 = 8\) , as opposed to \(2^2 = 4\) in 2D Clifford ResNet. This impacts the number of trainable parameters of the two pipelines, meaning that for the same number of channels or for the same size of the convolutional filters, STAResNet will generally have a larger number of trainable parameters.

To verify that STAResNet is consistently superior to a 2D GA approach and provide a fair comparison, we train both networks with a varying number of hidden channels, ranging between 15 and 40. This yields pipelines with different number of trainable parameters, from \(2 \times 10^5\) up to \(2 \times 10^6\) . Results are summarized in Fig. 5.30.

12803cdab4af4f047d9b34a2a11d7f09_img.jpg

Number of trainable parameters STAResNet G(1,2,0) seen geometries (MSE) Clifford ResNet G(2,0,0) seen geometries (MSE) STAResNet G(1,2,0) unseen geometries (MSE) Clifford ResNet G(2,0,0) unseen geometries (MSE)
1.0E+05 ~4.5 x 10^-7 ~1.6 x 10^-6 ~4.8 x 10^-7 ~1.8 x 10^-6
7.0E+05 ~3.5 x 10^-7 ~1.2 x 10^-6 ~4.2 x 10^-7 ~1.3 x 10^-6
1.3E+06 ~3.0 x 10^-7 ~1.0 x 10^-6 ~4.0 x 10^-7 ~1.5 x 10^-6
1.9E+06 ~2.5 x 10^-7 ~0.9 x 10^-6 ~3.8 x 10^-7 ~1.4 x 10^-6
2.5E+06 ~2.2 x 10^-7 ~0.8 x 10^-6 ~3.5 x 10^-7 ~1.6 x 10^-6

Figure 5.30: A line graph showing the Mean Squared Error (MSE) versus the Number of trainable parameters for STAResNet and Clifford ResNet. The x-axis represents the number of trainable parameters from 1.0E+05 to 2.5E+06. The y-axis represents the MSE on a logarithmic scale from 2.0 x 10^-7 to 2.0 x 10^-6. Four data series are plotted: STAResNet G(1,2,0) seen geometries (yellow diamonds), Clifford ResNet G(2,0,0) seen geometries (blue squares), STAResNet G(1,2,0) unseen geometries (red triangles), and Clifford ResNet G(2,0,0) unseen geometries (green circles). STAResNet consistently achieves lower MSE than Clifford ResNet, and seen geometries result in lower MSE than unseen geometries for both networks.

Fig. 5.30 Test error over the estimated EM fields in the presence of seen and unseen obstacle geometries versus the number of trainable parameters.

The first thing to notice is that the error decreases for a larger number of parameters in both networks, as expected. The test MSE error with 2D Clifford ResNet, however, never reaches values below \(1 \times 10^{-6}\) . Even with 40 channels and above 2 million trainable parameters, the 2D Clifford ResNet cannot catch up with STAResNet, that with just 15 channels and 370,000 parameters yields a test error of below \(5.5 \times 10^{-7}\) . This means that our 2D STAResNet, with 16.4% the number of parameters of 2D Clifford ResNet, can estimate EM fields twice as accurately. This proves that the

improvement offered by STAResNet comes from the mathematical space it works in rather than the fact that it is generally a larger network.

The second thing that can be seen in Fig. 5.30 is the robustness to previously unseen data. The gap in the error between seen and unseen obstacle configurations with STAResNet is visibly smaller than the gap with Clifford ResNet, which is less capable of generalisation when presented with test data which were not included in the training set. This is true regardless of the number of parameters of the networks.

f90eb4ab3628ec65f6a9f1bb5024272e_img.jpg

Figure 5.31: (a) Mean squared error (MSE) and (b) correlation between estimated and ground truth EM fields over test set versus rollout steps m for the 2D case. The figure contains two line plots. Plot (a) shows MSE on the y-axis (ranging from 0.0 to 1.5 x 10^-3) against rollout steps m on the x-axis (ranging from 1 to 10). Plot (b) shows SSIM on the y-axis (ranging from 0.95 to 1.0) against rollout steps m on the x-axis (ranging from 1 to 10). Both plots compare STAResNet G(1,2,0) (solid lines) and Clifford ResNet G(2,0,0) (dashed lines) for four different datasets with Δt values of 5s (blue), 8s (purple), 10s (yellow), and 15s (red). In (a), MSE increases with m for all datasets, but is lower for STAResNet than for Clifford ResNet. In (b), SSIM decreases with m for all datasets, but remains higher for STAResNet than for Clifford ResNet.

Fig. 5.31 (a) Mean squared error and (b) correlation between estimated and ground truth EM fields over test set versus rollout steps \(m\) for the 2D case. Different colors refer to datasets with different \(\Delta t\) in between frames.

Impact of rollout

Lastly, we quantify the rollout error yielded by the two models. In the context of sequential modeling, rollout refers to the process of using the model's own predictions as inputs to generate future predictions. For example, at rollout step \(m = 1\) , the model predicts \(y_{t+2\Delta t}\) using \(y_t\) and \(y_{t+\Delta t}\) , both ground truth. At the next rollout step, the model predicts \(y_{t+3\Delta t}\) using \(y_{t+\Delta t}\) and \(y_{t+2\Delta t}\) , in which now one of the inputs is the model's output at a previous time step. Successive time steps are estimated in a sliding window fashion, by feeding the model with its own outputs. It is desirable that the rollout error stays bounded in a model as it is much more likely that the model will be employed to estimate the PDE evolution in time rather than an isolated snapshot.

Since each trajectory in our datasets contains 12 samples, we measure the rollout error for \(m\) ranging between 1 and 10. Results are summarized in Fig. 5.31, in which we measure MSE and structural similarity index (SSIM) between GT and predicted fields.

Since the error rapidly propagates between successive time steps, we train Clifford ResNet and STAResNet on datasets captured for small \(\Delta t\) , i.e. \(\Delta t = \{5, 8, 10, 15\}s\) .

From Fig. 5.31 it is possible to notice that STAResNet approach yields consistently lower MSE and higher SSIM over Clifford ResNet, and that the gap between the two becomes more visible as the number of rollout steps increases. It can be argued that, as \(\Delta t\) increases, the gap between the two models tends to narrow. This happens because for large \(\Delta t\) there is little continuity between successive time steps, meaning that the error is already significant for small \(m\) , and for large \(m\) both estimates with STAResNet and Clifford ResNet deviate so much with respect to GT that the comparison is meaningless.

Examples of the evolution of \(\mathbf{F}^2\) as a function of time are given in Figs. 5.32-5.33 for two different sequences. We use the viridis colormap to report the absolute difference, clipped between \([0, 0.02]\) , between \(\mathbf{F}^2\) and \(\hat{\mathbf{F}}^2\) . It can be seen how errors in the estimates obtained via STAResNet are visibly more localised and their magnitude more bounded as opposed to those obtained via Clifford ResNet.

5.5.4 Experiments in 3D

In the 3D case we consider a volume with spatial resolution of \(28 \times 28 \times 28\) and step size \(\Delta x = \Delta y = \Delta z = 5 \times 10^{-7}m\) , with the EM field sampled with varying sampling period \(\Delta t = \{5, 8, 10, 15\}s\) . The light is propagated from 6 point sources randomly placed in each of the the \(xy\) , \(yz\) and \(xz\) planes, for a total of 18 planar sources. The wavelength of the emitted light is \(\lambda = 10^{-5}m\) . Each light source emits light with a random phase and random amplitude. Training, validation and test sets are structured similarly to their 2D counterparts.

We measure the MSE error between estimated and GT fields as described in Eq. 5.45, and visualise the magnitude of the 4D spacetime bivector:

\[\mathbf{F}^2 = (E_1\gamma_{10} + E_2\gamma_{20} + E_3\gamma_{30} + B_1\gamma_{13} + B_2\gamma_{13} + B_3\gamma_{12})^2 \quad (5.48)\]

Training and validation losses for Clifford ResNet and STAResNet at different sampling periods are shown in Fig. 5.34. Our STA approach achieves lower validation loss in all four cases, similarly to Fig. 5.23. The smaller gap between the loss profiles of STAResNet and CliffordResNet, as opposed to that in Fig. 5.23, is to be attributed to the small \(\Delta t\) at which the dataset is generated and smaller size of the domain rather than the fact that we are working in 3D over 2D.

bdd8a118ce45fd741ae90e74e2ee4f6c_img.jpg

Figure 5.32 displays the results of STAResNet and Clifford ResNet for 2D PDEs, comparing Ground Truth (GT) fields, estimated fields ( \(F_{est}\) ), and differences ( \(\Delta^2\) ) across different rollout steps \(m\) .

The figure is organized into four rows, corresponding to rollout steps \(m=1, 2, 8, 10\) . Each row contains three columns of heatmaps:

  • Column 1:** Ground Truth \(F^2/F^2\) (left) and \(F_{est}\) (right).
  • Column 2:** Ground Truth \(F^2/F^2\) (left) and \(F_{est}\) (right).
  • Column 3:** Differences \(\Delta^2\) (left) and \(\Delta^2\) (right).

The rows are labeled as follows:

  • (a) \(m = 1\)
  • (b) \(m = 2\)
  • (c) \(m = 8\)
  • (d) \(m = 10\)

Red circles highlight regions of significant difference in the \(\Delta^2\) column.

Figure 5.32: Comparison of Ground Truth, Estimated, and Difference fields for STAResNet and Clifford ResNet at rollout steps m=1, 2, 8, and 10. The figure is a 4x3 grid of heatmaps. The columns are labeled F2/F2, F_est, and Delta^2. The rows are labeled Ground Truth, (a) m=1, (b) m=2, (c) m=8, and (d) m=10. Each row contains two subplots for G(2,0,0) and G(1,2,0). Red circles highlight regions of significant difference in the Delta^2 column.

Fig. 5.32 2D GT \(F^2, \hat{F}^2\) , estimated \(F_{est} = \{\hat{F}^2, \hat{\hat{F}}^2\}\) and differences \(\Delta^2 = \{|F^2 - \hat{F}^2|, |F^2 - \hat{\hat{F}}^2|\}\) for STAResNet and Clifford ResNet at different rollout steps: (a) \(m = 1\) , (b) \(m = 2\) , (c) \(m = 8\) and (d) \(m = 10\) . Here \(\Delta t = 5s\) . Red circles highlight regions of significant difference Images are shown on a \(48 \times 48\) spatial grid.

2c802ad6f887973ee89f206288b91dea_img.jpg

Figure 5.33 displays four rows of heatmaps comparing Ground Truth (GT) fields, estimated fields ( \(F_{est}\) ), and difference fields ( \(\Delta^2\) ) for STAResNet and Clifford ResNet at different rollout steps \(m\) .

The columns represent:

  • Ground Truth**: The reference field \(F^2/F^2\) .
  • Estimated**: The estimated field \(F_{est}\) , shown for two grid resolutions: \(G(2,0,0)\) and \(G(1,2,0)\) .
  • Differences**: The difference field \(\Delta^2 = \{|F^2 - \hat{F}^2|, |F^2 - \hat{F}^2|\}\) , with red circles highlighting regions of significant difference.

The rows correspond to different rollout steps \(m\) :

  • (a) \(m = 1\)
  • (b) \(m = 5\)
  • (c) \(m = 8\)
  • (d) \(m = 10\)

The images are shown on a \(48 \times 48\) spatial grid.

Figure 5.33: Comparison of Ground Truth, Estimated, and Difference fields for STAResNet and Clifford ResNet at different rollout steps m=1, 5, 8, 10.

Fig. 5.33 2D GT \(F^2, \hat{F}^2\) , estimated \(F_{est} = \{\hat{F}^2, \hat{F}^2\}\) and differences \(\Delta^2 = \{|F^2 - \hat{F}^2|, |F^2 - \hat{F}^2|\}\) for STAResNet and Clifford ResNet at different rollout steps: (a) \(m = 1\) , (b) \(m = 5\) , (c) \(m = 8\) and (d) \(m = 10\) . Here \(\Delta t = 8s\) . Red circles highlight regions of significant difference. Images are shown on a \(48 \times 48\) spatial grid.

f106645198833980b601154c2ef98438_img.jpg

Plot (a) showing training and validation losses versus number of epochs for 2D Maxwell's PDEs with Δt = 5s. The y-axis is 'Loss' on a log scale from 10^-5 to 10^-6. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(3,0,0) (solid blue), Val G(3,0,0) (dashed blue), Training G(1,3,0) (solid orange), and Val G(1,3,0) (dashed orange). All losses decrease rapidly and then fluctuate between 10^-5 and 10^-6.

(a) \(\Delta t = 5\text{s}\) 80c92c2bb75a62d8b5c55b419c4d2ae5_img.jpg

Plot (b) showing training and validation losses versus number of epochs for 2D Maxwell's PDEs with Δt = 8s. The y-axis is 'Loss' on a log scale from 10^-5 to 2x10^-6. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(3,0,0) (solid blue), Val G(3,0,0) (dashed blue), Training G(1,3,0) (solid orange), and Val G(1,3,0) (dashed orange). All losses decrease rapidly and then fluctuate between 10^-5 and 2x10^-6.

(b) \(\Delta t = 8\text{s}\) 19fac9fb021b0c42b01ebebbc43a048e_img.jpg

Plot (c) showing training and validation losses versus number of epochs for 2D Maxwell's PDEs with Δt = 10s. The y-axis is 'Loss' on a log scale from 10^-5 to 4x10^-6. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(3,0,0) (solid blue), Val G(3,0,0) (dashed blue), Training G(1,3,0) (solid orange), and Val G(1,3,0) (dashed orange). All losses decrease rapidly and then fluctuate between 10^-5 and 4x10^-6.

(c) \(\Delta t = 10\text{s}\) 3679ad79478df5abcc1c0ead61f8f4aa_img.jpg

Plot (d) showing training and validation losses versus number of epochs for 2D Maxwell's PDEs with Δt = 15s. The y-axis is 'Loss' on a log scale from 4x10^-5 to 6x10^-6. The x-axis is 'Epochs' from 0 to 50. Four lines are shown: Training G(3,0,0) (solid blue), Val G(3,0,0) (dashed blue), Training G(1,3,0) (solid orange), and Val G(1,3,0) (dashed orange). All losses decrease rapidly and then fluctuate between 4x10^-5 and 6x10^-6.

(d) \(\Delta t = 15\text{s}\)

Fig. 5.34 Training and validation losses versus number of epochs for 2D Maxwell's PDEs for instances sampled at (a) 5s, (b) 8s, (c) 10s, (d) 15s.

A plot of the MSE as a function of the number of rollout steps \(m\) for successive time instants is given in Fig. 5.35. For each of the four sampling periods and regardless of the number of rollout steps, STAResNet consistently outperforms Clifford ResNet. Notice how this holds true also for \(\Delta t = 15\text{s}\) , while in the 2D case, in Fig. 5.31, errors explode for \(\Delta t \geq 15\text{s}\) .

6371bf2d5bc55af38b7784be69937742_img.jpg

Figure 5.35: A line graph showing Mean Squared Error (MSE) on the y-axis versus rollout steps m on the x-axis. The y-axis ranges from 0.0 to 7.0 x 10^-4. The x-axis ranges from 1 to 10. There are four pairs of lines representing different sampling periods: 5s (blue), 8s (purple), 10s (yellow), and 15s (red). For each pair, a solid line represents STAResNet G(1,3,0) and a dashed line represents Clifford ResNet G(3,0,0). In all cases, the solid line is below the dashed line, indicating lower MSE. The 15s pair shows the steepest increase in MSE, reaching 7.0 x 10^-4 at m=10.

Fig. 5.35 Mean squared error between estimated and ground truth EM fields over test set versus rollout steps \(m\) for the 3D case. Different colors refer to datasets with different \(\Delta t\) in between frames.

Examples of the evolution of the magnitude of the 3D Faraday bivector for successive rollout steps are shown in Fig. 5.36-5.37. The absolute difference between GT and estimated 3D Faraday bivectors' magnitudes clipped between \([0, 0.02]\) is shown in viridis color map, meaning they share the same range. The higher the intensity of the plot, the higher the error between ground truth. It can be noticed how, even after only two time steps, STAResNet yields output closer to GT as opposed to the 3D Clifford ResNet.

Slices are taken at the bottom ( \(z = 0\) ), middle ( \(z = 13\) ) and top ( \(z = 27\) ) of the volume. The clipped, absolute differences are shown in viridis color map. Clifford ResNet yields significantly larger errors for each of the four rollout steps and across each of the three vertical slices considered.

9caf59699e41441033fbf98a7d5656f8_img.jpg

Ground Truth                      G(1,3,0)                      G(3,0,0)

\(\hat{F}^2\)            \(|F^2 - \hat{F}^2|\)                      \(\hat{F}^2\)            \(|F^2 - \hat{F}^2|\)

z = 0

z = 13

z = 27

(a) \(m = 3\)

Ground Truth                      G(1,3,0)                      G(3,0,0)

\(\hat{F}^2\)            \(|F^2 - \hat{F}^2|\)                      \(\hat{F}^2\)            \(|F^2 - \hat{F}^2|\)

z = 0

z = 13

z = 27

(b) \(m = 5\)

Ground Truth slice at z=0 G(1,3,0) F^2 slice at z=0 G(1,3,0) |F^2 - F^2| slice at z=0 G(3,0,0) F^2 slice at z=0 G(3,0,0) |F^2 - F^2| slice at z=0 Ground Truth slice at z=13 G(1,3,0) F^2 slice at z=13 G(1,3,0) |F^2 - F^2| slice at z=13 G(3,0,0) F^2 slice at z=13 G(3,0,0) |F^2 - F^2| slice at z=13 Ground Truth slice at z=27 G(1,3,0) F^2 slice at z=27 G(1,3,0) |F^2 - F^2| slice at z=27 G(3,0,0) F^2 slice at z=27 G(3,0,0) |F^2 - F^2| slice at z=27 Ground Truth slice at z=0 G(1,3,0) F^2 slice at z=0 G(1,3,0) |F^2 - F^2| slice at z=0 G(3,0,0) F^2 slice at z=0 G(3,0,0) |F^2 - F^2| slice at z=0 Ground Truth slice at z=13 G(1,3,0) F^2 slice at z=13 G(1,3,0) |F^2 - F^2| slice at z=13 G(3,0,0) F^2 slice at z=13 G(3,0,0) |F^2 - F^2| slice at z=13 Ground Truth slice at z=27 G(1,3,0) F^2 slice at z=27 G(1,3,0) |F^2 - F^2| slice at z=27 G(3,0,0) F^2 slice at z=27 G(3,0,0) |F^2 - F^2| slice at z=27 Figure 5.36: Slices of the 3D GT F^2, F^2, estimated F^2, F^2 and differences |F^2 - F^2|, |F^2 - F^2| for STAResNet and Clifford ResNet at different heights z of the volume at different rollout steps: (a) m = 3, (b) m = 5. Delta t = 5s. Images are shown on a 28 x 28 spatial grid.

Fig. 5.36 Slices of the 3D GT \(F^2\) , \(\mathbf{F}^2\) , estimated \(\hat{F}^2\) , \(\hat{\mathbf{F}}^2\) and differences \(|F^2 - \hat{F}^2|\) , \(|\mathbf{F}^2 - \hat{\mathbf{F}}^2|\) for STAResNet and Clifford ResNet at different heights \(z\) of the volume at different rollout steps: (a) \(m = 3\) , (b) \(m = 5\) . \(\Delta t = 5\text{s}\) . Images are shown on a \(28 \times 28\) spatial grid.

a7a93bf707cf3a8d55ede7521a90362f_img.jpg

Figure 5.37 displays two sets of visualizations, (a) and (b), showing slices of the 3D ground truth (GT) \(F^2\) , estimated \(\hat{F}^2\) , and differences \(|F^2 - \hat{F}^2|\) for STAResNet and Clifford ResNet at different heights \(z\) of the volume at different rollout steps.

(a) \(m = 8\)

This section shows results for STAResNet at rollout step \(m = 8\) . It is organized into a grid with three rows ( \(z = 0, 13, 27\) ) and five columns. The columns are labeled as follows: 'Ground Truth', 'G(1,3,0) \(\hat{F}^2\) ', 'G(1,3,0) \(|F^2 - \hat{F}^2|\) ', 'G(3,0,0) \(\hat{F}^2\) ', and 'G(3,0,0) \(|F^2 - \hat{F}^2|\) '. Red circles highlight specific regions in the difference plots for G(1,3,0) and G(3,0,0).

(b) \(m = 10\)

This section shows results for Clifford ResNet at rollout step \(m = 10\) . It follows the same grid structure as (a), with three rows ( \(z = 0, 13, 27\) ) and five columns with identical labels. Red circles highlight specific regions in the difference plots for G(1,3,0) and G(3,0,0).

Figure 5.37: Slices of the 3D GT F^2, F^2, estimated F^2, F^2 and differences |F^2 - F^2|, |F^2 - F^2| for STAResNet and Clifford ResNet at different heights z of the volume at different rollout steps: (a) m = 8, (b) m = 10. Δt = 5s. Images are shown on a 28 × 28 spatial grid.

Fig. 5.37 Slices of the 3D GT \(F^2\) , \(\mathbf{F}^2\) , estimated \(\hat{F}^2\) , \(\hat{\mathbf{F}}^2\) and differences \(|F^2 - \hat{F}^2|\) , \(|\mathbf{F}^2 - \hat{\mathbf{F}}^2|\) for STAResNet and Clifford ResNet at different heights \(z\) of the volume at different rollout steps: (a) \(m = 8\) , (b) \(m = 10\) . \(\Delta t = 5\text{s}\) . Images are shown on a \(28 \times 28\) spatial grid.

5.6 Conclusions

In this chapter, we focused on GA networks to solve PDEs and presented three studies. Part I, including Sections 5.3 and 5.4 dealt with the solution of Navier-Stokes equations in 2D and 3D, respectively, for Computational Fluid Dynamics applications. Part II, including Section 5.5 dealt with Electromagnetism and the solution of Maxwell's equations.

In Section 5.3 we introduced GA-ReLU to discuss the theme of activation functions in GA networks. GA-ReLU is a modified version of ReLU for multivector-valued networks that attempts to take into account the coupling between multivector coefficients, and showed how it can improve over the baseline error for a 2D Navier-Stokes PDEs problem. GA-ReLU has the limitations of being designed as an adaptation real-valued activation (ReLU) and for a specific mathematical space ( \(\mathcal{G}_{2,0}\) ). Nevertheless, we hope that GA-ReLU can highlight the importance of non-linearities that take into account the structure of multivectors.

In Section 5.4 we have introduced Fengbo, a neural operator pipeline able to solve large-scale, 3D PDEs over complex shapes which sits entirely in 3D Clifford Algebra. With Fengbo, we combine the descriptive power of neural operators with the inductive bias and interpretability of networks in Clifford Algebra to obtain a compact pipeline that is able to estimate multiple physical quantities both accurately and at once, without extra computational overhead. We reported results on ShapeNet Car and Ahmed Body datasets and a test error lower than all previously reported models with the exception of some configurations of GINO. We are able to do so with a model with only 42 million parameters, with reduced computational complexity compared to graph-based models, by estimating also the 3D velocity vector field and without working in a latent space: By leveraging exclusively geometrical information of \(\Omega_D\) , Fengbo is a lightweight, expressive and accurate pipeline entirely in 3D Euclidean space.

Lastly, in Section 5.5 we have introduced STAResNet, a ResNet-inspired architecture that works with multivectors in STA, to shed light on the importance of choosing the right mathematical space for GA networks. We compared the performance of STAResNet, which is the first example of a network working in Spacetime Algebra, with Clifford ResNet, working in vanilla GA, on the solution of Maxwell's equations. STAResNet outperforms Clifford ResNet in 2D and 3D, at different sampling periods, in the presence of obstacles either in a previously seen or unseen configuration, and over multiple time steps into the future. Most notably, STAResNet is able to generalise better over previously unseen data and achieve a lower error at a fraction of the number

of trainable parameters as opposed to Clifford ResNet. This is not surprising as it is grounded in the physics of the problem, since working with Faraday bivectors in STA had already been shown to be a superior choice over \(G(3, 0, 0)\) . We therefore conclude that the choice of the right algebra in Clifford networks is a key factor in obtaining more descriptive, compact and accurate learning pipelines.

Chapter 6

Conclusions

「勝負はまだついちゃいないよ」
“Our fight isn’t over yet.”


— U. Thurman as The Bride,
Kill Bill, dir. Q. Tarantino (2003)

6.1 Conclusions

This thesis was focused on practical applications of GA as a tool to model, represent, and transform data in ML problems. It has been written with the goal of being a simple reference for researchers experienced in GA and interested in exploring its potential in the field of ML, and, at the same time, to serve as a guide for ML researchers unfamiliar with GA, offering an introduction to how GA can provide easy and powerful handling of geometrical data.

The work has been structured both chronologically, with the earlier chapters reflecting the initial stages of the PhD, and gradually progressing to more advanced topics, and thematically, with each chapter focusing on a different application domain.

In Chapter 1, we briefly introduced the fundamental concepts of GA and set the context for the thesis. It covered the basics definitions and operators of GA and various related algebras, such as CGA and STA, employed throughout the thesis.

In Chapter 2, we focused on the representation of rotations and introduced two problems where rotations were represented as rotors: (1) learning rotations for 3D point cloud pose estimation and inverse kinematics, and (2) molecular geometry optimisation.

In Chapter 3, GA was used as both a modeling and computational tool for protein structure prediction (PSP) problems. Protein structures were modeled as 3D planes in

CGA to develop novel orientational features, while GA-equivariant layers were used as projectors in PSP pipelines, leading to more accurate estimations of tertiary structures.

In Chapter 4, we tackled the problem of estimating 3D poses using GA, representing these poses as motors in 1d-Up CGA. We introduced CGAPoseNet+GCAN, which leverages GCAN layers and achieves state-of-the-art results over 13 datasets, and Define, Refine, Align, a pipeline including attentional, equivariant and rotational layers to align partially overlapping 3D line bundles.

In Chapter 5, we addressed the solution of PDEs using GA networks and further characterised this type of architecture. We introduced GA-ReLU, an activation function for more accurately solving the 2D Navier-Stokes equations, Fengbo, a pipeline for solving large-scale 3D PDEs over irregular geometries, and STAResNet, an architecture in STA for solving Maxwell’s equations.

6.2 Contributions

In this thesis we have made the following contributions:

  • we show the benefits of representing rotations in terms of rotors in application ranging from pose estimation, inverse kinematics and molecular geometry optimisation.

  • we introduce cost maps, a novel descriptive feature based on a GA modelling of protein structures.

  • we show original applications of GA networks in protein structure prediction, camera pose estimation, and PDE modelling.

  • we show how GA equivariant layers employed as projector modules in PSP pipelines yield more accurate protein structures in terms of GDT_TS and GDT_HA scores.

  • we report state-of-the-art results in camera pose estimation tasks over the Cambridge Landmarks and the 7 Scenes datasets and in the solution of 3D Navier-Stokes equations over irregular geometries over ShapeNet Car and Ahmed Body datasets.

  • we better characterise GA networks, demonstrating the importance of activation functions which respect the multivector structure and the choice of the algebra in which to embed data based on the specific problem to be solved.

  • we introduce the first example of a diffusion pipeline working partly in a GA framework, opening to geometry-aware, constrained diffusion models.

  • more generally, we show how the technique of using GA layers to embed data into an algebra and downsample them from many “proposals” to a single solution in regression tasks (Chapter 3.8, Chapter 4) provide valuable inductive bias that makes networks converge faster and yield more accurate results.

6.3 Future Work...

In this thesis, we prioritized an application-oriented approach. However, we recognise that a more rigorous theoretical characterization of the GA models presented, especially concerning the interpretability of intermediate outputs, which we frequently referenced throughout, remains necessary.

For protein structure prediction (PSP), we limited our focus to small-scale models due to computational constraints. A potential research direction could involve applying the concepts introduced in this thesis to larger PSP pipelines, such as AlphaFold, to determine whether GA layers could achieve state-of-the-art results in realistic pipelines used by the broader scientific community.

In the context of partial differential equations (PDEs), we examined only a few of the most notable equations relevant to engineering applications. Future work may include validating our claims (particularly those in Chapter 5.3 regarding the activation function and Chapter 5.5 concerning the choice of algebra) against other PDEs, such as those encountered in quantum physics.

6.4 ...and Beyond

The potential application scenarios for Geometric Algebra (GA) are virtually limitless. Since any neural architecture can, in principle, be reconfigured into a GA-based form, we believe GA offers a simple yet powerful inductive bias for tasks involving structured data with inherent geometric meaning. This invites a fundamental reconsideration of how we represent, understand, and manipulate geometric data and physical transformations in machine learning.

Recent advances in large-scale models, such as Vision Transformers (ViTs), diffusion models, and Vision-Language Models (VLMs), have demonstrated the power of general-purpose architectures. However, these models often lack geometric awareness,

relying instead on brute-force scale and data to capture geometric structure implicitly. GA-based networks offer a compelling complement: they can embed symmetries, conservation laws, and spatial relationships directly into the architecture, potentially improving sample efficiency, interpretability, and physical plausibility, often with lower model dimensionality and computational cost.

As geometric priors become increasingly important in fields like robotics, embodied AI, molecular modeling, and physical simulation, we view GA as a foundation for the next generation of models that natively reason about transformations, spaces, and structure, not just pixels or tokens.

References

  • [1] Achour, S., Hosni, Z., Darghouthi, S., and Syme, C. (2021). Assisted dipeptide bond formation: glycine as a case study. Heliyon, 7(6):e07276.

  • [2] Adhikari, B. (2020a). A fully open-source framework for deep learning protein real-valued distances. Scientific reports, 10(1):1–10.

  • [3] Adhikari, B. (2020b). A fully open-source framework for deep learning protein real-valued distances. Scientific reports, 10(1):13374.

  • [4] Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. (2011). Building rome in a day. Communications of the ACM, 54(10):105–112.

  • [5] Ahmed, S. R., Ramm, G., and Faltin, G. (1984). Some Salient Features of the Time-Averaged Ground Vehicle Wake. SAE Technical Paper, 840300.

  • [6] Al-Aradi, A., Correia, A., Jardim, G., de Freitas Naiff, D., and Saporito, Y. (2022). Extensions of the deep Galerkin method. Applied Mathematics and Computation, 430:127287.

  • [7] Alshawa, M. (2007). ICL: Iterative closest line A novel point cloud registration algorithm based on linear features. Ekscentar, 10:53–59.

  • [8] Azizzadenesheli, K., Kovachki, N., Li, Z., Liu-Schiaffini, M., Kossaifi, J., and Anandkumar, A. (2024). Neural operators for accelerating scientific simulations and design. Nature Reviews Physics, pages 1–9.

  • [9] Baillard, C., Schmid, C., Zisserman, A., and Fitzgibbon, A. (1999). Automatic line matching and 3D reconstruction of buildings from multiple views. In ISPRS Conference on Automatic Extraction of GIS Objects from Digital Imagery, volume 32, pages 69–80.

  • [10] Bartoli, A. and Sturm, P. (2001). The 3D line motion matrix and alignment of line reconstructions. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I.

  • [11] Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust features (SURF). Computer vision and image understanding, 110(3):346–359.

  • [12] Bayro-Corrochano, E. J. (2001). Geometric neural computing. IEEE Transactions on Neural Networks, 12(5):968–986.

  • [13] Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., and Davison, A. J. (2018). CodeSLAM—learning a compact, optimisable representation for dense visual SLAM. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2560–2568.

  • [14] Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., and Rother, C. (2014). Learning 6d object pose estimation using 3d object coordinates. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part II 13, pages 536–551. Springer.

  • [15] Brachmann, E., Michel, F., Krull, A., Yang, M. Y., Gumhold, S., et al. (2016). Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3364–3372.

  • [16] Brahmachary, S. and Thuerey, N. (2024). Unsteady cylinder wakes from arbitrary bodies with differentiable physics-assisted neural network. Physical Review E, 109(5):055304.

  • [17] Brandstetter, J., van den Berg, R., Welling, M., and Gupta, J. K. (2023). Clifford Neural Layers for PDE Modeling. The Eleventh International Conference on Learning Representations.

  • [18] Brégier, R. (2021). Deep regression on manifolds: a 3d rotation case study. In 2021 International Conference on 3D Vision (3DV), pages 166–174. IEEE.

  • [19] Brehmer, J., De Haan, P., Behrendts, S., and Cohen, T. S. (2023). Geometric Algebra Transformer. Advances in Neural Information Processing Systems, 36:35472–35496.

  • [20] Bronstein, M. M., Bruna, J., Cohen, T., and Veličković, P. (2021). Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478.

  • [21] Buchholz, S. (2005). A theory of neural computation with Clifford algebras.

  • [22] Buchholz, S. and Sommer, G. (2000). Learning geometric transformations with Clifford neurons. In International Workshop on Algebraic Frames for the Perception-Action Cycle, pages 144–153. Springer.

  • [23] Buchholz, S. and Sommer, G. (2001). Clifford algebra multilayer perceptrons. In Geometric Computing with Clifford Algebras: Theoretical Foundations and Applications in Computer Vision and Robotics, pages 315–334. Springer.

  • [24] Burley, S. K., Berman, H. M., Kleywegt, G. J., Markley, J. L., Nakamura, H., and Velankar, S. (2017). Protein Data Bank (PDB): the single global macromolecular structure archive. Protein Crystallography, pages 627–641.

  • [25] Cao, S. (2021). Choose a transformer: Fourier or galerkin. Advances in neural information processing systems, 34:24924–24940.

  • [26] Carleo, G., Cirac, I., Cramer, K., Daudet, L., Schuld, M., Tishby, N., Vogt-Maranto, L., and Zdeborová, L. (2019). Machine learning and the physical sciences. Reviews of Modern Physics, 91(4):045002.

  • [27] Ceci, G., Mucherino, A., D’Apuzzo, M., Serafino, D. D., Costantini, S., Facchiano, A., and Colonna, G. (2007). Computational methods for protein fold prediction: an ab-initio topological approach. In Data Mining in Biomedicine, pages 391–429. Springer.

  • [28] Censi, A. (2008). An ICP variant using a point-to-line metric. In 2008 IEEE International Conference on Robotics and Automation, pages 19–25.

  • [29] Chang, A. X., Funkhouser, T., Guibas, L. J., Hanrahan, P., Huang, Z., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. (2015). ShapeNet: An Information-Rich 3D Model Repository. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

  • [30] Chen, J., Yin, Y., Birdal, T., Chen, B., Guibas, L. J., and Wang, H. (2022). Projective manifold gradient layer for deep rotation regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6646–6655.

  • [31] Chen, K., Snavely, N., and Makadia, A. (2021). Wide-baseline relative camera pose estimation with directional learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3258–3268.

  • [32] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258.

  • [33] Clifford, W. K. (1882). Preliminary Sketch of Biquaternions (1873). Mathematical Papers, 658.

  • [34] Cohen, T. and Welling, M. (2016). Group equivariant convolutional networks. International conference on machine learning, pages 2990–2999.

  • [35] Cohen, T. S. and Welling, M. (2017). Steerable CNNs. In International Conference on Learning Representations.

  • [36] Costa, A., Ponnampati, M., Jacobson, J. M., and Chatterjee, P. (2021). Distillation of MSA Embeddings to Folded Protein Structures with Graph Transformers. bioRxiv.

  • [37] Cuomo, S., Di Cola, V. S., Giampaolo, F., Rozza, G., Raissi, M., and Piccialli, F. (2022). Scientific machine learning through physics-informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88.

  • [38] Das, R. and Baker, D. (2008). Macromolecular modeling with rosetta. Annu. Rev. Biochem., 77(1):363–382.

  • [39] De Haan, P., Weiler, M., Cohen, T., and Welling, M. (2020). Gauge equivariant mesh cnns: Anisotropic convolutions on geometric graphs. arXiv preprint arXiv:2003.05425.

  • [40] Defferrard, M., Milani, M., Gusset, F., and Perraudin, N. (2020). Deepsphere: a graph-based spherical cnn. arXiv preprint arXiv:2012.15000.

  • [41] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE.

  • [42] Deng, J., Li, X., Xiong, H., Hu, X., and Ma, J. (2024). Geometry-guided conditional adaptation for surrogate models of large-scale 3D PDEs on arbitrary geometries. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pages 5790–5798.

  • [43] Doran, C. and Lasenby, A. (2003). Geometric algebra for physicists. Cambridge University Press.

  • [44] Dorst, L., Fontijne, D., and Mann, S. (2009). Geometric Algebra for Computer Science: An Object-Oriented Approach to Geometry. Morgan Kaufmann.

  • [45] Dwivedi, V. P. and Bresson, X. (2020). A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699.

  • [46] Eide, E. R. (2018). Master’s Degree Thesis. University of Cambridge, Engineering Department, Camera Calibration using Conformal Geometric Algebra.

  • [47] Elmoozy, A., Dong, X., Lu, T., Westendorp, R., and Reddy, K. (2021). PoseGNN: Camera pose estimation system using graph neural networks. arXiv preprint arXiv:2103.09435.

  • [48] Engel, J., Schöps, T., and Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular SLAM. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part II 13, pages 834–849. Springer.

  • [49] Fang, Q., Zhao, K., Tang, D., Zhou, Z., Zhou, Y., Hu, T., and Zhou, H. (2018). Euler angles based loss function for camera localization with deep learning. In 2018 IEEE 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pages 61–66. IEEE.

  • [50] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395.

  • [51] Fout, A., Byrd, J., Shariat, B., and Ben-Hur, A. (2017). Protein interface prediction using graph convolutional networks. Advances in neural information processing systems, 30.

  • [52] Friswell, M. I., Mottershead, J. E., Friswell, M., and Mottershead, J. (1995). Finite element modelling. Springer.

  • [53] Fu, Q., Yu, H., Lai, L., Wang, J., Peng, X., Sun, W., and Sun, M. (2019). A robust RGB-D SLAM system with points and lines for low texture indoor environments. IEEE Sensors Journal, 19(21):9908–9920.

  • [54] Furukawa, Y. and Ponce, J. (2009). Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence, 32(8):1362–1376.

  • [55] Gao, H. and Ji, S. (2019). Graph u-nets. In international conference on machine learning, pages 2083–2092. PMLR.

  • [56] Gao, S., Wan, J., Ping, Y., Zhang, X., Dong, S., Yang, Y., Ning, H., Li, J., and Guo, Y. (2021). Pose refinement with joint optimization of visual points and lines. In 2021 IEEE International Conference on Image Processing (ICIP), pages 1769–1773. IEEE.

  • [57] Gligorijević, V., Renfrew, P. D., Kosciolek, T., Leman, J. K., Berenberg, D., Vatanen, T., Chandler, C., Taylor, B. C., Fisk, I. M., Vlamakis, H., et al. (2021). Structure-based protein function prediction using graph convolutional networks. Nature communications, 12(1):3168.

  • [58] Grassia, F. S. (1998). Practical parameterization of rotations using the exponential map. Journal of graphics tools, 3(3):29–48.

  • [59] Grimme, S., Bannwarth, C., and Shushkov, P. (2017). A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements ( \(Z = 1\) –86). Journal of chemical theory and computation, 13(5):1989–2009.

  • [60] Hackel, T., Savinov, N., Ladicky, L., Wegner, J., Schindler, K., and Pollefeys, M. (2017). Semantic3D. Net: A new large-scale point cloud classification benchmark. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4:91–98.

  • [61] Hadfield, H., Wieser, E., Arsenovic, A., and Kern, R. (2020). The Pygae Team: pygae/clifford: v1. 3.1 (2020). DOI: https://doi.org/10.5281/zenodo.1453978.

  • [62] Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in neural information processing systems, vol. 30.

  • [63] Hartley, R. and Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge university press.

  • [64] He, K., Zhang, X., Ren, S., and Sun, J. (2016a). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.

  • [65] He, K., Zhang, X., Ren, S., and Sun, J. (2016b). Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer.

  • [66] Held, D., Thrun, S., and Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 749–765. Springer.

  • [67] Helton, J. C. and Davis, F. J. (2003). Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety, 81(1):23–69.

  • [68] Hestenes, D. (2015). Space-time algebra, 2nd Edition. Springer.

  • [69] Hildenbrand, D. and Charrier, P. (2011). Conformal geometric objects with focus on oriented points. In ICCA9, 7th International Conference on Clifford Algebras and Their Applications.

  • [70] Hitzer, E. (2012). Introduction to Clifford’s geometric algebra. Journal of the Society of Instrument and Control Engineers, 51(4):338–350.

  • [71] Hitzer, E. (2022). Inner product of two oriented points in conformal geometric algebra. In ICACGA 2022, 1st International Conference on Advanced Computational applications of Geometric Algebra.

  • [72] Hodgins, F. and Macey, J. (2009). Guide to the carnegie mellon university multimodal activity (cmu-mmoc) database. CMU-RI-TR-08-22.

  • [73] Hofer, M., Maurer, M., and Bischof, H. (2014). Improving sparse 3D models for man-made environments using line-based 3D reconstruction. In 2014 2nd International Conference on 3D Vision, volume 1, pages 535–542. IEEE.

  • [74] Hofer, M., Maurer, M., Bischof, H., and Leberl, F. (2016). Efficient 3D scene abstraction using line segments. In Computer Vision and Image Understanding, volume 145, pages 64–75. Elsevier.

  • [75] Holl, P., Koltun, V., Um, K., and Thuerey, N. (2020). phiflow: A differentiable pde solving framework for deep learning via physical simulations. In NeurIPS workshop, volume 2.

  • [76] Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366.

  • [77] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

  • [78] Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017a). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708.

  • [79] Huang, Z., Wan, C., Probst, T., and Van Gool, L. (2017b). Deep learning on lie groups for skeleton-based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6099–6108.

  • [80] Jie, F. and Coauthor, S. (2024). MEG-Net: A Deep Learning Framework for 3D Line Matching and Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1234–1241.

  • [81] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Židek, A., Potapenko, A., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589.

  • [82] Kahlown, R. (2022). TensorFlow Geometric Algebra. Zenodo.

  • [83] Kaiser, R., Stockton, A., Kim, Y., Jensen, E., and Mathies, R. (2013). On the formation of dipeptides in interstellar model ices. The Astrophysical Journal, 765(2):111.

  • [84] Kaján, L., Hopf, T. A., Kalaš, M., Marks, D. S., and Rost, B. (2014). FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC bioinformatics, 15(1):1–6.

  • [85] Kanzler, M., Rautenhaus, M., and Westermann, R. (2018). A voxel-based rendering pipeline for large 3D line sets. IEEE transactions on visualization and computer graphics, 25(7):2378–2391.

  • [86] Kar, A., Tulsiani, S., Carreira, J., and Malik, J. (2015). Category-specific object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1966–1974.

  • [87] Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L. (2021). Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440.

  • [88] Kendall, A. and Cipolla, R. (2016). Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA), pages 4762–4769. IEEE.

  • [89] Kendall, A. and Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5974–5983.

  • [90] Kendall, A., Grimes, M., and Cipolla, R. (2015). Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946.

  • [91] Kendall, A. G. (2019). Geometry and uncertainty in deep learning for computer vision. PhD thesis, University of Cambridge, UK.

  • [92] Klein, G. and Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality, pages 225–234. IEEE.

  • [93] Koch, T., Korner, M., and Fraundorfer, F. (2016). Automatic alignment of indoor and outdoor building models using 3D line segments. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 10–18.

  • [94] Langlois, P.-A., Boulch, A., and Marlet, R. (2019). Surface reconstruction from 3d line segments. In 2019 International Conference on 3D Vision (3DV), pages 553–563. IEEE.

  • [95] Lasenby, A. (2004). Recent applications of conformal geometric algebra. In Computer Algebra and Geometric Algebra with Applications, pages 298–328. Springer.

  • [96] Lasenby, A. (2011). Rigid body dynamics in a constant curvature space and the ‘1D-up’ approach to conformal geometric algebra. In Guide to geometric algebra in practice, pages 371–389. Springer.

  • [97] Lasenby, A. (2020). A 1D Up approach to conformal geometric algebra: applications in line fitting and quantum mechanics. Advances in Applied Clifford Algebras, 30(2):1–16.

  • [98] Lasenby, A. and Doran, C. (2001). Physical applications of geometric algebra. Handout collection from a Cambridge University lecture course.

  • [99] Lasenby, A. and Lasenby, J. (2001). Applications of geometric algebra in physics and links with engineering. In Geometric Algebra with Applications in Science and Engineering, pages 430–457. Springer.

  • [100] Lasenby, A., Lasenby, J., and Matsantonis, C. (2022). Reconstructing a rotor from initial and final frames using characteristic multivectors: With applications in orthogonal transformations. Mathematical Methods in the Applied Sciences.

  • [101] Lasenby, J. and Dorst, L. (2011). Guide to geometric algebra in practice. Springer.

  • [102] Lasenby, J., Hadfield, H., and Lasenby, A. (2019). Calculating the rotor between conformal objects. Advances in Applied Clifford Algebras, 29(5):1–9.

  • [103] Lasenby, J., Lasenby, A., and Doran, C. (2000). A unified mathematical language for physics and engineering in the 21st century. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1765):21–39.

  • [104] LeCun, Y., Bengio, Y., et al. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995.

  • [105] Lee, M., Park, S., and Kim, J. (2020). Learning-based 3D Line Registration for Scene Understanding. IEEE Transactions on Robotics, 36(4):1234–1245.

  • [106] Lee, S. J. and Hwang, S. S. (2019). Elaborate monocular point and line slam with robust initialization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1121–1129.

  • [107] Lemieux, G. S.-P., Paquet, E., Viktor, H. L., and Michalowski, W. (2022). Geometric deep learning for protein–protein interaction predictions. IEEE Access, 10:90045–90055.

  • [108] Li, J., Yue, J., Zhang, W., and Duan, W. (2022). The deep learning galerkin method for the general stokes equations. Journal of Scientific Computing, 93(1):5.

  • [109] Li, K., Pfaff, F., and Hanebeck, U. D. (2020a). Grid-based quaternion filter for SO (3) estimation. In 2020 European Control Conference (ECC), pages 1738–1744. IEEE.

  • [110] Li, Z., Huang, D. Z., Liu, B., and Anandkumar, A. (2023). Fourier neural operator with learned deformations for pdes on general geometries. Journal of Machine Learning Research, 24(388):1–26.

  • [111] Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. (2020b). Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485.

  • [112] Li, Z., Kovachki, N., Choy, C., Li, B., Kossaifi, J., Otta, S., Nabian, M. A., Stadler, M., Hundt, C., Azizzadenesheli, K., et al. (2024). Geometry-informed neural operator for large-scale 3d pdes. Advances in Neural Information Processing Systems, 36.

  • [113] Li, Z., Kovachki, N. B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., Anandkumar, A., et al. (2020c). Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations.

  • [114] Lippe, P., Veeling, B., Perdikaris, P., Turner, R., and Brandstetter, J. (2024). Pde-refiner: Achieving accurate long rollouts with neural pde solvers. Advances in Neural Information Processing Systems, 36.

  • [115] Liszka, T. and Orkisz, J. (1980). The finite difference method at arbitrary irregular grids and its application in applied mechanics. Computers & Structures, 11(1-2):83–95.

  • [116] Liu, L., Li, H., Yao, H., and Zha, R. (2021). Pluckernet: Learn to register 3d line reconstructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1842–1852.

  • [117] Liu, S., Yu, Y., Pautrat, R., Pollefeys, M., and Larsson, V. (2023). 3D line mapping revisited. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21445–21455.

  • [118] Liu, Y., Huang, T. S., and Faugeras, O. D. (1990). Determination of camera location from 2-D to 3-D line and point correspondences. IEEE Transactions on pattern analysis and machine intelligence, 12(1):28–37.

  • [119] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110.

  • [120] Lu, X., Ma, C., Ni, B., Yang, X., Reid, I., and Yang, M.-H. (2018). Deep regression tracking with shrinkage loss. In Proceedings of the European conference on computer vision (ECCV), pages 353–369.

  • [121] Luisetto, M., Edbey, K., Mashori, G., Yesvi, A., and Latyschev, O. (2021). OPEN and CLOSED state of SPIKE SARS-COV-2: relationship with some integrin binding. A biological molecular approach to better understand the coagulant effect. Archives of Biotechnology and Biomedicine, 5(1):049–056.

  • [122] Lütjens, B., Crawford, C. H., Watson, C. D., Hill, C., and Newman, D. (2022). Multiscale Neural Operator: Learning Fast and Grid-independent PDE Solvers. In ICML 2022 2nd AI for Science Workshop.

  • [123] Mancini, G., Fusè, M., Lazzari, F., and Barone, V. (2022). Fast exploration of potential energy surfaces with a joint venture of quantum chemistry, evolutionary algorithms and unsupervised learning. Digital Discovery, 1(6):790–805.

  • [124] Marchand, E., Uchiyama, H., and Spindler, F. (2015). Pose estimation for augmented reality: a hands-on survey. IEEE transactions on visualization and computer graphics, 22(12):2633–2651.

  • [125] Masci, J., Boscaini, D., Bronstein, M., and Vandergheynst, P. (2015). Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pages 37–45.

  • [126] Mathis, A., Schneider, S., Lauer, J., and Mathis, M. W. (2020). A primer on motion capture with deep learning: principles, pitfalls, and perspectives. Neuron, 108(1):44–65.

  • [127] Melekhov, I., Ylioinas, J., Kannala, J., and Rahtu, E. (2017). Relative camera pose estimation using convolutional neural networks. In International Conference on Advanced Concepts for Intelligent Vision Systems, pages 675–687. Springer.

  • [128] Melnyk, P., Felsberg, M., and Wadenbäck, M. (2021). Embed me if you can: A geometric perceptron. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1276–1284.

  • [129] Micusik, B. and Kosecka, J. (2009). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 84(1):97–113.

  • [130] Micusik, B., Wildenauer, H., and Kosecka, J. (2008). Structure from motion with line segments under relaxed endpoint constraints. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE.

  • [131] Mongiardo, M., Mastri, F., Monti, G., and Rozzi, T. (2019). Maxwell’s equations and potentials in dirac form using geometric algebra. In 2019 IEEE MTT-S International Wireless Symposium (IWS), pages 1–3. IEEE.

  • [132] Ommer, B. and Malik, J. (2009). Multi-scale object detection by clustering lines. In 2009 IEEE 12th International Conference on Computer Vision, pages 484–491. IEEE.

  • [133] Pan, L., Cai, Z., and Liu, Z. (2024). Robust partial-to-partial point cloud registration in a full range. IEEE Robotics and Automation Letters, 9(3):2861–2868.

  • [134] Pavllo, D., Grangier, D., and Auli, M. (2018). Quaternet: A quaternion-based recurrent model for human motion. arXiv preprint arXiv:1805.06485.

  • [135] Pearson, J. and Bisset, D. (1994). Neural networks in the Clifford domain. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), volume 3, pages 1465–1469. IEEE.

  • [136] Pepe, A. (2025). Torch-GA: building Geometric Algebra Networks in PyTorch. Zenodo.

  • [137] Pepe, A., Buchholz, S., and Lasenby, J. (2024a). Clifford Group Equivariant Neural Network Layers for Protein Structure Prediction. In Northern Lights Deep Learning Conference, pages 205–211. PMLR.

  • [138] Pepe, A., Buchholz, S., and Lasenby, J. (2024b). GA-reLU: an activation function for geometric algebra networks applied to 2d navier-stokes PDEs. In ICLR 2024 Workshop on AI4DifferentialEquations In Science.

  • [139] Pepe, A., Buchholz, S., and Lasenby, J. (2024c). STAResNet: a Network in Spacetime Algebra to solve Maxwell’s PDEs. arXiv preprint arXiv:2408.13619.

  • [140] Pepe, A. and Lasenby, J. (2023). CGAPoseNet: Camera pose regression via a 1D-Up approach to Conformal Geometric Algebra. arXiv preprint arXiv:2302.05211.

  • [141] Pepe, A. and Lasenby, J. (2024). Modeling orientational features via geometric algebra for 3d protein coordinates prediction. Mathematical Methods in the Applied Sciences, 47(14):11385–11404.

  • [142] Pepe, A., Lasenby, J., and Buchholz, S. (2024d). CGAPoseNet+ GCAN: A geometric clifford algebra network for geometry-aware camera pose regression. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6593–6603.

  • [143] Pepe, A., Lasenby, J., and Chacón, P. (2022a). Geometric Algebra Models of Proteins for Three-Dimensional Structure Prediction. In International Conference on Advanced Computational Applications of Geometric Algebra, pages 63–74. Springer.

  • [144] Pepe, A., Lasenby, J., and Chacon, P. (2022b). Geometric Algebra Models of Proteins for Three-Dimensional Structure Prediction: A Detailed Analysis. In International Conference on Advanced Computational Applications of Geometric Algebra, pages 141–159. Springer.

  • [145] Pepe, A., Lasenby, J., and Chacon, P. (2022c). Using a graph transformer network to predict 3d coordinates of proteins via geometric algebra modelling. In International Workshop on Empowering Novel Geometric Algebra for Graphics and Engineering, pages 83–95. Springer.

  • [146] Pepe, A., Lasenby, J., and Chacón, P. (2024e). Learning rotations. Mathematical Methods in the Applied Sciences, 47(3):1204–1217.

  • [147] Peretroukhin, V., Giamou, M., Rosen, D. M., Greene, W. N., Roy, N., and Kelly, J. (2020). A smooth representation of belief over so (3) for deep rotation learning with uncertainty. arXiv preprint arXiv:2006.01031.

  • [148] Perrone, N. and Kao, R. (1975). A general finite difference method for arbitrary meshes. Computers & Structures, 5(1):45–57.

  • [149] Pfaff, T., Fortunato, M., Sanchez-Gonzalez, A., and Battaglia, P. (2020). Learning Mesh-Based Simulation with Graph Networks. In International Conference on Learning Representations.

  • [150] Phaniteja, S., Dewangan, P., Guhan, P., Sarkar, A., and Krishna, K. M. (2017). A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. In 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1818–1823.

  • [151] Puzzarini, C., Biczysko, M., Barone, V., Largo, L., Pena, I., Cabezas, C., and Alonso, J. L. (2014). Accurate characterization of the peptide linkage in the gas phase: A joint quantum-chemical and rotational spectroscopy study of the glycine dipeptide analogue. The Journal of Physical Chemistry Letters, 5(3):534–540.

  • [152] Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660.

  • [153] Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2017). Machine learning of linear differential equations using Gaussian processes. Journal of Computational Physics, 348:683–693.

  • [154] Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707.

  • [155] Raonic, B., Molinaro, R., De Ryck, T., Rohner, T., Bartolucci, F., Alaifari, R., Mishra, S., and de Bézenac, E. (2024). Convolutional neural operators for robust and accurate learning of PDEs. Advances in Neural Information Processing Systems, 36.

  • [156] Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18, pages 234–241. Springer.

  • [157] Roy, S., Shinde, R., Phillips, C. E., Kumar, A., Leong, W. J., Maskey, M., and Ramachandran, R. (2024). Clifford Neural Operators on Atmospheric Data Influenced Partial Differential Equations. In 12th International Conference on Learning Representations.

  • [158] Ruhe, D., Brandstetter, J., and Forré, P. (2023a). Clifford group equivariant neural networks. Advances in Neural Information Processing Systems, 36:62922–62990.

  • [159] Ruhe, D., Gupta, J. K., De Keninck, S., Welling, M., and Brandstetter, J. (2023b). Geometric clifford algebra networks. In Proceedings of the 40th International Conference on Machine Learning, pages 29306–29337.

  • [160] Sandford, S. A., Aléon, J., Alexander, C. M. O., Araki, T., Bajt, S., Baratta, G. A., Borg, J., Bradley, J. P., Brownlee, D. E., Brucato, J. R., et al. (2006). Organics captured from comet 81P/Wild 2 by the Stardust spacecraft. Science, 314(5806):1720–1724.

  • [161] Saxena, A., Driemeyer, J., and Ng, A. Y. (2009). Learning 3-d object orientation from images. In 2009 IEEE International conference on robotics and automation, pages 794–800. IEEE.

  • [162] Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1):61–80.

  • [163] Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., and Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937.

  • [164] Silva, D.-A., Bowman, G. R., Sosa-Peinado, A., and Huang, X. (2011). A role for both conformational selection and induced fit in ligand binding by the LAO protein. PLoS computational biology, 7(5):e1002054.

  • [165] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • [166] Sirignano, J. and Spiliopoulos, K. (2018). DGM: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364.

  • [167] Storn, R. and Price, K. (1997). Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4):341.

  • [168] Taketomi, T., Okada, K., Yamamoto, G., Miyazaki, J., and Kato, H. (2014). Camera pose estimation under dynamic intrinsic parameter change for augmented reality. Computers & graphics, 44:11–19.

  • [169] Tan, M. and Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR.

  • [170] Tang, J., Liu, Y., Yu, B., and Liu, X. (2024). 3D Line Matching Network Based on Matching Existence Guidance and Knowledge Distillation. IEEE Internet of Things Journal.

  • [171] Tateno, K., Tombari, F., Laina, I., and Navab, N. (2017). Cnn-slam: Real-time dense monocular slam with learned depth prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6243–6252.

  • [172] Taubner, F., Tschopp, F., Novkovic, T., Siegwart, R., and Furrer, F. (2020). Lcd-line clustering and description for place recognition. In 2020 International Conference on 3D Vision (3DV), pages 908–917. IEEE.

  • [173] Triggs, B., McLauchlan, P. F., Hartley, R. I., and Fitzgibbon, A. W. (2000). Bundle adjustment—a modern synthesis. In Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22, 1999 Proceedings, pages 298–372. Springer.

  • [174] Um, K., Brand, R., Fei, Y. R., Holl, P., and Thuerey, N. (2020). Solver-in-the-loop: Learning from differentiable physics to interact with iterative pde-solvers. Advances in Neural Information Processing Systems, 33:6111–6122.

  • [175] Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., and Torr, P. H. (2017). End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2805–2813.

  • [176] Vasconcelos, F. X. and Nascimento, J. C. (2024). Correspondence Free Multivector Cloud Registration using Conformal Geometric Algebra. arXiv preprint arXiv:2406.11732.

  • [177] Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.

  • [178] Veličković, P. (2023). Everything is connected: Graph neural networks. Current Opinion in Structural Biology, 79:102538.

  • [179] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.

  • [180] Virtue, P., Stella, X. Y., and Lustig, M. (2017). Better than real: Complex-valued neural nets for MRI fingerprinting. In 2017 IEEE international conference on image processing (ICIP), pages 3953–3957. IEEE.

  • [181] Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., and Cremers, D. (2017). Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision, pages 627–637.

  • [182] Wang, W., Mei, G., Zhang, J., Sebe, N., Lepri, B., and Poiesi, F. (2025). Fully-Geometric Cross-Attention for Point Cloud Registration. arXiv preprint arXiv:2502.08285.

  • [183] Wang, Y. and Solomon, J. M. (2019). Pnet: Self-supervised learning for partial-to-partial registration. Advances in neural information processing systems, 32.

  • [184] Wareham, R., Cameron, J., and Lasenby, J. (2004). Applications of conformal geometric algebra in computer vision and graphics. In Computer algebra and geometric algebra with applications, pages 329–349. Springer.

  • [185] Wareham, R. and Lasenby, J. (2004). Rigid body pose and position interpolation using geometric algebra. Submitted to ACM Transactions on Graphics.

  • [186] Willard, J., Jia, X., Xu, S., Steinbach, M., and Kumar, V. (2020). Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1):1–34.

  • [187] Witt, J. and Weltin, U. (2013). Robust stereo visual odometry using iterative closest multiple lines. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 4164–4171. IEEE.

  • [188] Wu, H., Luo, H., Wang, H., Wang, J., and Long, M. (2024a). Transolver: A Fast Transformer Solver for PDEs on General Geometries. In International Conference on Machine Learning, pages 53681–53705. PMLR.

  • [189] Wu, Y., Zhang, Y., Ma, W., Gong, M., Fan, X., Zhang, M., Qin, A. K., and Miao, Q. (2024b). RORNet: Partial-to-Partial Registration Network With Reliable Overlapping Representations. IEEE Transactions on Neural Networks and Learning Systems, 35(11):15453–15466.

  • [190] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2020). A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24.

  • [191] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920.

  • [192] Xiang, S. and Li, H. (2020). Revisiting the continuity of rotation representations in neural networks. arXiv preprint arXiv:2006.06234.

  • [193] Xiao, Z., Hao, Z., Lin, B., Deng, Z., and Su, H. (2023). Improved Operator Learning by Orthogonal Attention. In Forty-first International Conference on Machine Learning.

  • [194] Xu, H., Liu, S., Wang, G., Liu, G., and Zeng, B. (2021). Onnet: Learning overlapping mask for partial-to-partial point cloud registration. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3132–3141.

  • [195] Xu, H., Ye, N., Liu, G., Zeng, B., and Liu, S. (2022). FINet: Dual branches feature interaction for partial-to-partial point cloud registration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2848–2856.

  • [196] Xu, Y., Roy, V., and Kitani, K. (2020). Estimating 3d camera pose from 2d pedestrian trajectories. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 2568–2577. IEEE.

  • [197] Yang, J., Anishchenko, I., Park, H., Peng, Z., Ovchinnikov, S., and Baker, D. (2020). Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences, 117(3):1496–1503.

  • [198] Ye, J., Ni, J., and Yi, Y. (2017). Deep learning hierarchical representations for image steganalysis. IEEE Transactions on Information Forensics and Security, 12(11):2545–2557.

  • [199] Yun, S., Jeong, M., Kim, R., Kang, J., and Kim, H. J. (2019). Graph transformer networks. Advances in neural information processing systems, 32.

  • [200] Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., and Zhou, Z. (2020). Structured3d: A large photo-realistic dataset for structured 3d modeling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 519–535. Springer.

  • [201] Zhou, Y., Barnes, C., Lu, J., Yang, J., and Li, H. (2019). On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5745–5753.

  • [202] Zhou, Y., Gallego, G., Rebecq, H., Kneip, L., Li, H., and Scaramuzza, D. (2018). Semi-dense 3D reconstruction with a stereo event camera. In Proceedings of the European conference on computer vision (ECCV), pages 235–251.