Summary

Modality availability

  • Seeks to resolve the dependence on modality availability in current MM architectures
    • Modality-specific missingness is a common real-world problem and can fundamentally bias the model when the missingness of a modality is predictive of the label (known as missing not-at-random, MNAR).
      • The common solution of restricting learning to data points with a complete set of modalities creates models that perform inequitably in populations with fewer available resources
      • In complex real-world datasets, there is often no intersection of complete availability, thus necessitating the exclusion of modalities or significantly limiting the train set.
      • On the other hand, imputation explicitly featurizes missingness, thus risking to create a trivial model that uses the presence of features rather than their value for the prediction

Example

  • The MNAR issue is particularly common in medicine, where modality acquisition is dependent on the decision of the healthcare worker (i.e. the decision that the model is usually attempting to emulate).
  • For example, a patient with a less severe form of a disease may have less intensive monitoring and advanced imagery unavailable.
  • If the goal is to predict prognosis, the model could use the missingness of a test rather than its value.
  • This is a fundamental flaw and can lead to catastrophic failure in situations where the modality is not available for independent reasons (for instance resource limitations).

Design

  • MultiModN is a multimodal, modular network that fuses latent representations in a sequence of any number, combination, or type of modality
    • while providing granular real-time predictive feedback on any number or combination of predictive tasks.
    • does not compromise performance compared with a baseline of parallel fusion.
    • By simulating the challenging bias of missing not-at-random (MNAR),
      • this work shows that, contrary to MultiModN, parallel fusion baselines erroneously learn MNAR and suffer catastrophic failure when faced with different patterns of MNAR at inference.

Thoughts

  • MultiModN can be completely order-invariant and idempotent if randomized during training.

Detailed

Notation

  • We have a set of State vectors

  • Comparison of modular MultiModN (a) vs. monolithic P-Fusion (b)

    • MultiModN inputs any number/combination of modalities (mod) into a sequence of mod-specific encoders (e).
    • It can skip over missing modalities.
    • A state (s) is passed to the subsequent encoder and updated.
    • Each state can be fed into any number/combination of decoders (d) to predict multiple tasks.
    • Modules are identified as grey blocks comprising an encoder, a state, and a set of decoders
  • A module of a modular model is defined as a self-contained computational unit that can be isolated, removed, added, substituted, or ported.

  • It is also desirable for modules to be order invariant and idempotent, where multiple additions of the same module have no additive effect.

    • achieved through randomization during training