Behavioral Data Categorization for Transformers-based Models in Digital Health

Clauirton Siebra1, Igor Matias1, and Katarzyna Wac1

1Quality of Life Technologies Lab, University of Geneva, Switzerland

| Abstract: Transformers are recent deep learning (DL) models used to capture the dependence between parts of sequential data. While their potential was already demonstrated in the natural language processing (NLP) domain, emerging research shows transformers can also be an adequate modeling approach to relate longitudinal multi-featured continuous behavioral data to future health outcomes. As transformers-based predictions are based on a domain lexicon, the use of categories, commonly used in specialized areas to cluster values, is the likely way to compose lexica. However, the number of categories may influence the transformer prediction accuracy, mainly when the categorization process creates imbalanced datasets, or the search space is very restricted to generate optimal feasible solutions. This paper analyzes the relationship between models’ accuracy and the sparsity of behavioral data categories that compose the lexicon. This analysis relies on a case example that uses mQoL-Transformer to model the influence of physical activity behavior on sleep health. Results show that the number of categories shall be treated as a further transformer’s hyperparameter, which can balance the literature-based categorization and optimization aspects. Thus, DL processes could also obtain similar accuracies compared to traditional approaches, such as long short-term memory, when used to process short behavioral data sequences.

In 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)