https://www.mdu.se/

mdu.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
MULTI-MODAL DOCUMENT CONTEXT SEARCH with LLMs for MANUFACTURING INDUSTRIES
Mälardalen University, School of Innovation, Design and Engineering. Alstom Group.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
Abstract [en]

Manufacturing industries rely on vast collections of multi-modal documents for

product development and maintenance, encompassing hardware specifications, soft-

ware documentation, and technical diagrams across diverse formats. Efficiently

retrieving relevant information from these complex documents presents significant

challenges due to their domain-specific terminology and structural complexity. This

thesis investigates the application of Retrieval-Augmented Generation (RAG) sys-

tems with Large Language Models (LLMs) for manufacturing document search and

question answering. The research compares traditional vector-based RAG approaches

with advanced graph-based methods like LightRAG to evaluate their effectiveness for

industrial documentation retrieval. A comprehensive preprocessing pipeline was de-

veloped to handle multi-modal content, extracting structured information from text,

tables, and technical diagrams while preserving document context. Experimental

evaluations using documents from the European Union Agency for Railways demon-

strate that different RAG architectures excel in different scenarios: vector-based

approaches with advanced prompting strategies performed well for specific low-level

queries, while graph-based global retrieval strategies showed superior performance for

complex questions requiring synthesis across multiple documents. While automated

metrics showed advanced prompting strategies achieving higher ROUGE and BLEU

scores, manual analysis revealed that graph-based methods often produced more

comprehensive and contextually relevant answers for complex queries. This research

contributes to the understanding of RAG systems for industrial applications and

provides insights for optimizing multi-modal document retrieval in manufacturing

contexts.

Place, publisher, year, edition, pages
2025. , p. 40
National Category
Computer Vision and Learning Systems
Identifiers
URN: urn:nbn:se:mdh:diva-73552OAI: oai:DiVA.org:mdh-73552DiVA, id: diva2:2004353
External cooperation
Alstom Group
Subject / course
Computer Science
Supervisors
Examiners
Available from: 2025-10-08 Created: 2025-10-07 Last updated: 2025-10-10Bibliographically approved

Open Access in DiVA

fulltext(3170 kB)100 downloads
File information
File name FULLTEXT01.pdfFile size 3170 kBChecksum SHA-512
13fb5dbb9ce60be8e212e453c47a335defef211140d7f06c6cf9c9d517b623a43bec353d44dbfa35c70b395f2389a031d168df2fd1910c30c952ac268f6187d5
Type fulltextMimetype application/pdf

By organisation
School of Innovation, Design and Engineering
Computer Vision and Learning Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 899 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf