Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12104/104815
Registro completo de metadatos
Campo DCValorLengua/Idioma
dc.contributor.authorAguilar Valdez, Sofía Alejandra
dc.date.accessioned2024-09-18T17:07:16Z-
dc.date.available2024-09-18T17:07:16Z-
dc.date.issued2024-06-24
dc.identifier.urihttps://wdg.biblio.udg.mx
dc.identifier.urihttps://hdl.handle.net/20.500.12104/104815-
dc.description.abstractIn this work, I measured interdisciplinary research within a curated corpus derived from the CORD-19 open research dataset. This dataset was chosen based on the assumption that COVID-19 literature represents a collection of highly diverse ideas, reflecting the urgent need to develop vaccines during the pandemic. By understanding the diversification of ideas as the movement of concepts across the academic landscape, I proposed a framework to represent contexts. This framework utilized topic modeling and hierarchical agglomerative clustering to map topic clusters into a low-dimensional space, grouping semantically similar papers together. The analysis revealed distinct topic clusters that, when mapped, displayed branches and subclusters. A quantitative analysis suggested that topics act as different contexts, with branches forming when semantically similar topics mapped onto different locations. Subclusters were attributed to indirect relationships between two uncommon topics through a third topic sharing characteristics with both, reflecting contextual rather than purely semantic features. Finally, the Shannon entropy of each topic was evaluated and compared to reported values for emerging academic fields. The entropy values for all topics were higher, validating the presence of megadiverse ideas in the CORD-19 dataset and confirming the methodology as a viable framework for identifying interdisciplinarity in a multidisciplinary corpus.
dc.description.tableofcontentsContents Abstract iii Acknowledgements v 1 Introduction 1 2 Materials and methods 3 2.1 Data collection and processing . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Topic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Topic aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Evaluation of interdisciplinary research . . . . . . . . . . . . . . . . . . 7 3 Results and discussion 9 3.1 Description of the publications . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Lexical diversity in the corpus . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Description of the topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 Assessment of interdisciplinary research . . . . . . . . . . . . . . . . . . 15 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Conclusions and future directions 21 A Visualization of the topic words 23 B Visualization of the topic documents 27 Bibliography 29
dc.formatapplication/PDF
dc.language.isoeng
dc.publisherBiblioteca Digital wdg.biblio
dc.publisherUniversidad de Guadalajara
dc.rights.urihttps://www.riudg.udg.mx/info/politicas.jsp
dc.subjectTopic Modeling
dc.subjectHierarchical Agglomerative Clustering
dc.titleOn finding megadiversity among the corpus of scientific literature
dc.typeTesis de Maestría
dc.rights.holderUniversidad de Guadalajara
dc.rights.holderAguilar Valdez, Sofía Alejandra
dc.coverageGUADALAJARA, JALISCO
dc.type.conacytmasterThesis
dc.degree.nameMAESTRIA EN CIENCIAS EN BIOINGENIERIA Y COMPUTO INTELIGENTE
dc.degree.departmentCUCEI
dc.degree.grantorUniversidad de Guadalajara
dc.rights.accessopenAccess
dc.degree.creatorMAESTRIA EN CIENCIAS EN BIOINGENIERO EN Y COMPUTO INTELIGENTE
dc.contributor.directorMorales Valencia, José Alejandro
dc.contributor.codirectorParedes, Omar
Aparece en las colecciones:CUCEI

Ficheros en este ítem:
Fichero TamañoFormato 
MCUCEI10969FT.pdf1.18 MBAdobe PDFVisualizar/Abrir


Los ítems de RIUdeG están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.