Review on Implicit Neural Representations and its Applications in Medical Imaging
Implicit Neural Representation (INR)
- Explicit representation
- Conventional approaches to encoding input signals as representations typically follow an explicit paradigm, where the input space is discretized or partitioned into separate elements (e.g., point clouds, voxel grids, and meshes).
- Explicit (or discrete) representations directly encode the features or signal values.
- Implicit neural representation (INR)
- Implicit representations are defined as a generator function that maps input coordinates to their corresponding value within the input space.
- A Multi-Layer Perceptron (MLP) is trained to parameterize the signal of interest, such as an image or shape, utilizing coordinates as input.
- The objective is to predict the corresponding data values at those coordinates. Thus, the MLP serves as an Implicit Neural Representation function that encodes the signal’s representation within its weights.
For INR, a simple MLP can be learned to continuously represent the signal of interest as an implicit function \begin{equation} \Psi: \mathbf{x} \in \mathbb{R}^M \to \Psi (\mathbf{x}) \in \mathbb{R}^N, \end{equation} mapping their \(M\)-dimensinal spatial coordinates to \(N\)-dimensional values (e.g., color, occupancy, etc.).
Input
- The conventional approach in INR treats the spatial coordinate of each element in the signal, such as pixels in an image, as the input to an MLP. However, this approach tends to learn low-frequency functions, limiting its ability to effectively represent complex signals.
- Using a sinusoidal mapping of the Cartesian coordinates to a higher dimensional space, which enables the learning of high-frequency details more effectively. Denote the signal coordinates as \(\mathbf{v}\), we have different ways of incorporate positional information,
- Basic: \(\gamma (\mathbf{v}) = \left[ \cos 2\pi\mathbf{v}, \sin 2\pi\mathbf{v} \right]^\top\).
- Positional encoding (PE): \(\gamma (\mathbf{v}) = \left[ \cdots, \cos 2\pi\sigma^{j/m}\mathbf{v}, \sin 2\pi\sigma^{j/m}\mathbf{v}, \cdots \right]^\top\) for \(j = 0, \cdots, m-1\). The hyperparameter \(\sigma\) is selected considering different tasks and datasets.
- Gaussian PE: \(\gamma (\mathbf{v}) = \left[ \cos 2\pi\mathbf{B}\mathbf{v}, \sin 2\pi\mathbf{B}\mathbf{v} \right]^\top\), where \(\mathbf{B}\) is a matrix whose elements are sampled from a Gaussian distribution \(\mathcal{N} \left( 0, \sigma^2 \right)\). The scale \(\sigma\) is selected through a hyperparameter sweep for each task and dataset.
Activations
- For implicit representations, nonlinearities can be either periodic or nonperiodic. However, non-periodic functions, such as ReLU or Tanh, are not conducive to the effective learning of highfrequency signals.
- To address this issue, sine functions can be utilized as the activation function of the MLP to parametrize complex data, \begin{equation} \Psi (\mathbf{x}) = \mathbf{W} _n \left(\psi _{n-1} \circ \psi _{n-2} \circ \cdots \circ \psi _0 \right) (\mathbf{x}) + \mathbf{b} _n, \end{equation} \begin{equation} \mathbf{x} _i \mapsto \psi _i (\mathbf{x} _{i}) = \sin \left( \mathbf{W} _i \mathbf{x} _i + \mathbf{b} _i \right). \end{equation} where \(\psi _i\) indicates the \(i\)-th layer of the MLP, \(\mathbf{W} _i\), \(\mathbf{b} _i\) is the corresponding weight and bias, and \(\mathbf{x}\) is the signal of interest.
Neural Radiance Field
- Neural Radiance Fields (NeRFs) unites INRs with volume rendering by using fully connected MLPs to implicitly represent scenes and objects with the goal of novel view synthesis. The objective of novel view synthesis is to develop a system that can generate novel viewpoints of an object from any direction by observing a few images of that particular object. The process is defined as \begin{equation} F(\mathbf{x}, \mathbf{d}) \to (\mathbf{c}, \sigma). \end{equation} where \(\mathbf{x}\) indicates the 3D location \((x, y, z)\), \(d\) represents the 2D vector of viewing direction \((\theta, \phi)\), \(\mathbf{c}\) denotes the RGB color values \((R, G, B)\), and \(\sigma\) is the voxel intensity.
- Original NeRF uses ReLU activation and a positional encoding approach to map coordinates to higher dimensions, \begin{equation} \gamma (p) = \left( \sin 2^0 \pi p, \cos 2^0 \pi p, \cdots, \sin 2^{L-1} \pi p, \cos 2^{L-1} \pi p \right), \end{equation} where \(p\) can be each of the coordinate or viewing direction components.
- NeRF follows a two-stage procedure to obtain voxel density and color values,
- \(\sigma, h = \text{MLP} (\mathbf{x})\).
- \(c = \text{MLP} ([h, d])\).
Review of INR in Medical Imaging
Medical Image Reconstruction
Neural Rendering
Enjoy Reading This Article?
Here are some more articles you might like to read next: