Sarah Pardo

14 December 2022

The Abu Dhabi AI Connect conference delivered two days of talks from distinguished figures working at the cutting edge of AI research and practice. Several shared themes stood out as an indication of trends in the field.

*Making the new unreasonable effectiveness reasonable*

In recent years, the mathematical community has worked to develop stronger formal understanding of learning methods involving neural networks, and to identify properties and components of neural networks (NN) with equivalent objects in various areas of mathematics. Much revolves around the question of what specifically and most fundamentally contributes to the distinctive power of NNs, and several areas of inquiry were presented for answers:

**Structure of the network:**
NNs are concatenations of function dictionaries, which is in contrast with traditional function dictionaries which are typically superpositions (e.g. wavelets, Fourier modes).
This dis

**Learning large scale interactions:**
NN layers learn filters, but importantly they also learn scale interactions.
Comparable performance on vision tasks can be achieved when the filters are not learned but implemented as fixed wavelet transforms, suggesting that the power lies more in the learned scale relations.

**Iterative optimization algorithms:**
NNs can be viewed as parameterized iterative algorithms, where the particular advantage is in the learned parameters.
For instance, when used to implement optimization algorithms, this introduces an interesting added capacity--the ability to optimize the optimizer, as it were.

**High-dimensionality:**
NNs are uniquely capable of extracting structure from high-dimensional problems, and are also susceptible to analysis as high-dimensional objects (e.g. infinite-width limits).
NNs can model dynamical systems, and are themselves stochastic dynamical systems, and this duality may underlie their unique capabilities.

**Statistical physics:**
Statistical physics is naturally implicated as a discipline centered on strongly interacting disordered systems with many components.
For instance, a seemingly high-dimensional system can have a reduced and more-tractable effective dimension, and sub-manifolds can be identified;
using this analysis has revealed "perceptual" sub-manifolds in data such as MNIST.

**Mean field games:**
While the size of a population in a system may go to infinity, it may be the case that the system has only few explanatory variables.
It may be that the effectiveness of NNs lies in the ability of an architecture to identify such hidden variables.

*Using what’s already there*

Neural networks have proven to be particularly effective in cases where the underlying mapping lacks strong regularity properties, or at least those properties are not well known in advance. However, many applications do contain elements which are potentially quite well known, and have an existing body of analytic techniques which can form a foundation or inspiration for design of neural network models.

**Model-based deep learning:**
In model-based deep learning, the "learning" component is introduced “in a gradual way.” One takes an existing model or solver and substitutes parts of the algorithm with relatively simple networks.
This leads to architectures which can be considerably simpler and more interpretable, without losing performance.

**Using inductive bias in the treatment of data:**
It is important to apply available knowledge of the spaces in which data lives, for instance distinguishing image-based computer vision tasks for which Euclidean geometry is sufficient, from tasks like brain connectomics with data that live on a sphere, for which Riemannian methods are required.
This kind of prior knowledge can inform the design of models which enforce applicable symmetries or invariance properties.

*Optimizing distributed systems*

The commercial trend in computing is toward increasingly large and embedded systems, and along with it comes the demand for AI methods colocated with these systems. This introduces problems related to training and deploying models in a distributed context.

**Cost vs utility:**
On the one hand, top computing resources today require investment on the order of $50m to $1b, which makes the technology extremely specialized and places the highest levels of computing power squarely in the hands of those with the highest concentrations of capital.
On the other hand, such systems may only utilize a small fraction of their capacity, as low as 1-3%. It’s clear that much more can be done with much smaller resources, given the development and application of improved distributed computing algorithms and communication protocols.

**“Distributed”-ness in computing is a diverse problem:**
Different conditions have different implications for communication requirements–-how much of which data needs to be kept for how long and transferred how often–-as well as potential heterogeneity in noise or available granularity of the data itself.
This is important in the development of federated learning, where a cross-device application may involve on the order of 10^5 clients, but each client will potentially participate only once; cross-silo applications may involve fewer (e.g. 10^2) clients, but all clients may participate in all rounds.
General algorithms need to perform well on all of these cases, treating them on a continuum.

**Scale-free solutions: **
This can be informed by the free energy principle of Karl Friston, which provides a scale-free description of the behavior of systems stemming from the causal partitioning properties of Markov blankets.
The problem can also be formalized by constructing a configuration space, for instance from dimensions of cost, resource, and quality, allowing for a legible analysis of trajectories and construction of optimization problems.

*Centering ethics and safety*

The trend in computing toward increasingly large and embedded systems also naturally poses foundational questions of ethics and safety, in particular surrounding the collection and use of data.

**Imperatives for ethical data use:**
This includes enforcing people’s rights over their own information, distributing the value which is created from data, and using data to maximize social welfare.
Some of the existing work includes the development of frameworks for standardizing privacy preservation in federated learning applications, to facilitate systematic adoption in practice.

**Autonomous systems:**
The discussion of autonomous systems considered the importance of balancing the trustworthiness of a system and the criticality of its task, a calculation which can determine the degree of autonomy the system should be granted.

**Safety in healthcare applications:**
We mean potentially multiple things by AI in healthcare: transparently automating doctors’ thought process; reproducing doctors’ analyses, but possibly with unknown, black-box logic; or finding novel relations in data to develop new treatments.
AI in healthcare affords a natural “Turing test”: for a given task, does the system perform indistinguishably from how a doctor performs, with equal accuracy or quality?
However, this test may not be satisfactory in the black-box case when a model lacks the explanatory power of a doctor, and AI-powered design of new treatments is considered of the same risk level as drug development.

**Takeaways for AI in healthcare**

Many learning problems are not totally unstructured and mysterious; this is particularly true in imaging and medical applications and we can use it to our advantage. Considering the additional importance in healthcare of interpretable models and decision-making, it is also of interest to develop models where the “black-box” components are restricted to points where they are essential in providing a performance advantage. The use of a model-based learning approach is promising in its potential to deliver models with both high performance and improved explainability.

Distributed computing is heavily implicated in healthcare data; health data in particular tends to be distributed in many small, heterogeneous sets, making the aggregation of data from multiple sources particularly appealing. It is also an area where the use of edge devices for patient monitoring and treatment could have a great impact. At the same time, healthcare is an area where it is essential to weigh the trustworthiness of a system with the criticality of the task it fulfills, and to contemplate the ethics of AI, which takes on a heightened importance when the data in question pertains to people’s health. The development of reliable, privacy-preserving models is critical, and as new frameworks and standards are built, health applications must be at the lead in adoption.