This survey article explores graph-based approaches to multimodal human activity recognition in indoor environments, emphasizing their relevance to advancing multimodal representation and reasoning. With the growing importance of integrating diverse data sources such as sensor events, contextual information, and spatial data, effective human activity recognition methods are essential for applications in smart homes, digital health, and more. We review various graph-based techniques, highlighting their strengths in encoding complex relationships and improving activity recognition performance. Furthermore, we discuss the computational efficiencies and generalization capabilities of these methods across different environments. By providing a comprehensive overview of the state-of-the-art in graph-based human activity recognition, this article aims to contribute to the development of more accurate, interpretable, and robust multimodal systems for understanding human activities in indoor settings.