Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.12104/104815
Title: | On finding megadiversity among the corpus of scientific literature |
Author: | Aguilar Valdez, Sofía Alejandra |
metadata.dc.contributor.director: | Morales Valencia, José Alejandro |
Keywords: | Topic Modeling;Hierarchical Agglomerative Clustering |
Issue Date: | 24-Jun-2024 |
Publisher: | Biblioteca Digital wdg.biblio Universidad de Guadalajara |
Abstract: | In this work, I measured interdisciplinary research within a curated corpus derived from the CORD-19 open research dataset. This dataset was chosen based on the assumption that COVID-19 literature represents a collection of highly diverse ideas, reflecting the urgent need to develop vaccines during the pandemic. By understanding the diversification of ideas as the movement of concepts across the academic landscape, I proposed a framework to represent contexts. This framework utilized topic modeling and hierarchical agglomerative clustering to map topic clusters into a low-dimensional space, grouping semantically similar papers together. The analysis revealed distinct topic clusters that, when mapped, displayed branches and subclusters. A quantitative analysis suggested that topics act as different contexts, with branches forming when semantically similar topics mapped onto different locations. Subclusters were attributed to indirect relationships between two uncommon topics through a third topic sharing characteristics with both, reflecting contextual rather than purely semantic features. Finally, the Shannon entropy of each topic was evaluated and compared to reported values for emerging academic fields. The entropy values for all topics were higher, validating the presence of megadiverse ideas in the CORD-19 dataset and confirming the methodology as a viable framework for identifying interdisciplinarity in a multidisciplinary corpus. |
URI: | https://wdg.biblio.udg.mx https://hdl.handle.net/20.500.12104/104815 |
metadata.dc.degree.name: | MAESTRIA EN CIENCIAS EN BIOINGENIERIA Y COMPUTO INTELIGENTE |
Appears in Collections: | CUCEI |
Files in This Item:
File | Size | Format | |
---|---|---|---|
MCUCEI10969FT.pdf | 1.18 MB | Adobe PDF | View/Open |
Items in RIUdeG are protected by copyright, with all rights reserved, unless otherwise indicated.