A fast and low-cost method to detect nearduplicate Images in large dataset based on fingerprint extraction and Deep Learning

Nasri Shandiz, Fatemeh

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12104/92292

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.advisor	Dra. Maciel Arellano, Ma. Del Rocío
dc.contributor.advisor	Dra. Gaytán Lugo, Laura Sanely
dc.contributor.advisor	Dr. Beltrán Ramírez, Jesús Raúl
dc.contributor.author	Nasri Shandiz, Fatemeh
dc.date.accessioned	2023-06-18T20:10:22Z	-
dc.date.available	2023-06-18T20:10:22Z	-
dc.date.issued	2023-03-16
dc.identifier.uri	https://wdg.biblio.udg.mx
dc.identifier.uri	https://hdl.handle.net/20.500.12104/92292	-
dc.description.abstract	Recognizing near-duplicate images from large datasets is a crucial task in image retrieval and content identification. Finding similar images in order to reduce redundancy is timeconsuming in large datasets. Most of image representation targeting methods at conventional image retrieval issues for detecting duplicate are either computationally expensive to extract and match or have robustness limitations. In this work, we propose a fast method to detect near-duplicate images in a large dataset, which is computationally low cost and effective by using image fingerprints to determine similarity between a query image and near-duplicated images in a large dataset. We extract a series of fingerprints combining global and local features also using a deep learning model as a fingerprint for each image in the dataset and store them in a separate database. Then we apply successive filters to the query image, discarding non-similar images in the process until reaching a final set of near-duplicate images. we achieved to discarding most of the non-similar images in the early stages of the process and focuses on robustness in the latter stages, where the set of near-duplicate candidate images is significantly smaller. This allows to perform the query process on the fly. The proposed method and experimental results provide a right compromise between accuracy and speed in detecting near-duplicate images from a large dataset even via a low performance potential computer such has home use laptop or a workstation computer.
dc.description.tableofcontents	Table of Contents Abstract Dedication Acknowledgments Table of Contents List of Tables List of Figures Chapter1 Introduction 1.1 Problem Definition 1.2 Research Question 1.3 Hypothesis 1.4 Objective 1.4.1 General Objective 1.4.2 Specific Objective 1.5 Motivation 1.6 Contributions 1.7 Thesis Scope 1.8 Thesis Outline Review of literature 2.1 Near-Duplicate Images (NDI). 2.1.2 Image attacks 2.2 Image Retrieval 2.2.1 Text-Based Image Retrieval (TBIR) 2.2.1.1 Limitations of Text-Based image retrieval (TBIR) 2.2.2 Semantic-Based Image Retrieval (SBIR) 2.2.2.1 Limitations of Semantic-Based Image Retrieval (SBIR) 2.2.3 Content-Based Image Retrieval (CBIR) 2.3 Fingerprint Method Content-Base Image Retrieval 2.3.1 Feature Extraction 2.3.2 Low -level Features 2.3.3 High-level Features 2.3.4 The combination of High-level and Low-level Features 2.4 Calculate Image Similarity . 2.4.1 The Similarity Measurement 2.4.2 The Metric Distance Chapter 3 The Proposed Method 3.1 Phase-One 3.1.1 Hash Fingerprint 3.1.2 Color Histogram Fingerprint 3.1.3 Thumbnail Fingerprint 3.1.4 Deep Convolutional Neural Network Fingerprint 3.1.4.1 Very Deep Convolutional Networks VGG19 3.1.5 ORB (Oriented FAST and rotated BRIEF) Fingerprint 3.1.6 BOVW (Bag of Visual Words) Fingerprint 3.2 Phase-Two 3.2.1 Fuzzy search 3.2.1.1 BK-Tree Search 3.2.1.2 Query Image 3.2.2 Levenshtein Distance 3.2.3 Cosine Distance 3.2.4 Jaccard distance 3.3 Phase-Three 3.3.2 Similarity Calculation for Fingerprints 3.3.1 Matching system 3.3.2.1 Similarity measure of Histogram and Thumbnail 3.3.2.2 Similarity measure of ORB 3.3.2.3 Similarity measure of BOVW 3.3.2.4 Similarity measure of Vgg19 3.3.3 Voting Process Chapter 4 The Experimental Results 4.1 Data Collection 4.1.1 The Images Databases 4.1.1.1 Native Data Base 4.1.1.2 Query Images Data Base 4.1.1.3 Fingerprint database 4.2 Database settings 4.2.1 The attack images 4.3 Evaluation protocol 4.3.1 Experimental setup 4.3.2 Execution the algorithm 4.3.2.1 Performance Validation 4.3.3 Recall, Precision and F-Score 4.3.3.1 Parameters and Performance 4.3.3.2 Sequential vs OR method results 4.4 Results Analysis 4.4.1 Fuzzy Search maximum distances BK-Tree 4.4.2 Threshold values for the different fingerprints 4.4.3 Total Similarity Index 4.4.4 Evaluation of the execution speed 4.4.6 Improving the algorithm 4.4.7 Software and user interface 4.5 Comparison of the proposed method and pure-deep learning method 4.5.1 Resources consumption 4.5.2 Results from pure-deep learning method 4.5.3 Execution both methods Chapter 5 5.1 Conclusions 5.2 Future Work Appendix Symbols and abbreviations Appendix1 Appendix2 Appendix3
dc.format	application/PDF
dc.language.iso	eng
dc.publisher	Biblioteca Digital wdg.biblio
dc.publisher	Universidad de Guadalajara
dc.rights.uri	https://www.riudg.udg.mx/info/politicas.jsp
dc.subject	Deep Learning
dc.subject	low-Cost Method
dc.subject	images In Large
dc.title	A fast and low-cost method to detect nearduplicate Images in large dataset based on fingerprint extraction and Deep Learning
dc.title.alternative	A Thesis in the Field of Artificial intelligence and computer vision For the Doctorate Degree in Information Technology
dc.type	Tesis de Doctorado
dc.rights.holder	Universidad de Guadalajara
dc.rights.holder	Nasri Shandiz, Fatemeh
dc.coverage	ZAPOPAN JALISCO
dc.type.conacyt	doctoralThesis
dc.degree.name	DOCTORADO EN TECNOLOGIAS DE INFORMACION
dc.degree.department	CUCEA
dc.degree.grantor	Universidad de Guadalajara
dc.rights.access	openAccess
dc.degree.creator	DOCTOR EN TECNOLOGIAS DE INFORMACION
dc.contributor.director	Dr. Orizaga Trejo, José Antonio
dc.contributor.codirector	Dr. Larios Rosillo, Víctor Manuel
Aparece en las colecciones:	CUCEA

Ficheros en este ítem:

Fichero	Tamaño	Formato
DCUCEA10120FT.pdf	5.86 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem