## Chemical Graph Theory

The field of chemical graph theory arose back in the 1970’s when a Japanese chemist by the name of Haruo Hosoya defined a molecular, graph-based structure descriptor Z which he referred to as a topological index. The motivation for studying the topological index was that certain physico-chemical properties (boiling points in particular) of alkanes are correlated with the topological index Z. In this project, we study the topological index known as the Merrifield-Simmons index first defined in 1989. Interestingly, it has been shown that the Merrifield-Simmons index of a path of order n, or P_n, is F_{n+1} where F_n represents the nth Fibonacci number. Motivated by the need to break down carcinogenic PAHs (polycyclic aromatic hydrocarbons), we study certain topological indices of benzenoids, which are condensed polycyclic unsaturated fully conjugated hydrocarbons composed exclusively of six-membered rings. Specifically, using the fact that any benzenoid can be embedded in the Cartesian product of at most three trees, we try to derive a formula for the Merrifield-Simmons index for any benzenoid and ideally identify those benzenoids with the smallest Merrifield-Simmons index with respect to order.

**Mentor:** Kirsti Kuenzel (Trinity College)

## Metric number theory and dynamics

In metric number theory, we study properties of numbers from the measure- theoretic (or “metric”) point of view. A classical result in this area is Émile Borel’s 1909 theorem that almost every real number is normal, meaning that for any b greater than or equal to 2, every finite string of digits from {0,1,…,b-1} appears in its base-b expansion with the frequency that one would naively guess that it should based on the length of the string. (By contrast with this “almost everywhere” result, it is famously difficult to determine that any specific real number is normal. We do not even know whether pi is normal!) One attractive aspect of metric number theory is that the questions that one asks tend to be relatively easy to understand (although they may be extremely difficult to answer), and there is a variety of techniques that one can draw from in order to address them. For example, Borel originally proved the aforementioned result using probability theory; another rather simple proof of it uses basic ideas from ergodic theory, a branch of dynamics. In the past few decades, the tools of dynamics have been used to great effect in metric number theory, particularly in the use of homogeneous dynamics to answer questions in Diophantine approximation, a sister-discipline of metric number theory. In this project, we will explore problems at the interface of metric number theory and dynamics.

**Mentors:** Felipe A Ramírez (Wesleyan University)

## Ambient Dimension of Manifold-fitting Data

The success of algorithms in analysis of high-dimensional data is often attributed to the manifold hypothesis, which supposes that this data often lie on or near a manifold of much lower dimension. (We refer to the original dimension of the data as the ambient dimension) This can be shown to be true in some instances when there is a generative model; for instance, the set of digital photographs of a face taken with varied vertical and horizontal orientations will lie on a manifold of dimension two since there are two degrees of freedom in its movement. Past research has focused on how to reduce the ambient dimension of the data while preserving as much information as possible. In this project, we will use generative models to vary the ambient dimension and investigate how much information we gain when it increases, particularly when there is error in measurement. A large part of the project will involve applying various algorithms to computer-generated data, though there is certainly room for developing a theoretical framework and analysis for our model.

**Mentor:** Kevin O’Neill and Asaf Etgar (Yale)

## Graph Neural Networks

Graph neural networks (GNNs) have seen a high degree of success in node level tasks such as node representation learning and node classification. However, there still exist limitations in the ability of GNNs to distinguish similar but non-isomorphic graphs (such as the strongly regular graphs). Researchers have shown that the classic message-passing GNNs are at most as powerful as the WL test of isomorphism. In order to tackle this issue, researchers have proposed many solutions ranging from utilizing graph-topology-based features to complex aggregation functions, at the cost of significantly higher computational complexity. However, we propose to treat graph features as signals on a graph and study signal processing based wavelet transforms in graphs. In particular we would like to establish results on expressivity of GNNs that leverage multi-level graph wavelet transforms. Among wavelets of study are diffusion wavelets, and holomorphic wavelets. We will also study the composition of such wavelets to form deep networks. Furthermore, preliminary studies show that graph wavelet transforms have the potential to better tackle graph with heterophily, and we aim to demonstrate theoretical advantages of such GNNs in a unified expressiveness framework.

**Mentor:** Smita Krishnaswamy, Rex Ying, Ian Adelstein, Edward De Brouwer (Yale) and Michael Perlmutter (UCLA)

## Saddle Conenctions on Translation Surfaces

Translation surfaces are geometric objects made of polygons with pairs of parallel, equal length sides glued together via translations. An example is the flat torus, which is a square whose opposite sides are identified. Following a straight line in a translation surface can lead to complicated trajectories. One special class of such trajectories are those connecting vertices to vertices, which are called saddle connections. In the torus, those are the lines starting at one corner of the square with rational slopes.

An interesting question is: how random are the slopes of the saddle connections of a given translation surface? For the torus, a version of this question is — how random are the rational numbers between 0 and 1? In that case, it turns out that the fractions between 0 and 1 with denominator at most n distribute uniformly in the unit interval as n goes to infinity. In this sense, they are quite random. However, we may consider a finer test of randomness by considering the distribution of gaps between consecutive fractions of denominator up to n as n tends to infinity. It turns out that using a geometric and dynamical perspective, this distribution can be calculated explicitly. Moreover, this method can be generalized to study the distribution of gaps between slopes of saddle connections on translation surfaces more generally.

This question has been settled for some translation surfaces other than the torus, such as the ones obtained by glueing opposite edges of a regular polygon with an even number of sides (by past SUMRY participants). In this project, we will explore the slope gap distribution of surfaces such as those formed by squares glued in special ways.

**Mentors:** Taylor McAdam and Fernando Al Assal (Yale)