Umme Sara, Morium Akter, Mohammad Shorif Uddin
Quality is a very important parameter for all objects and their functionalities. In image-based object recognition, image quality is a prime criterion. For authentic image quality evaluation, ground truth is required. But in practice, it is very difficult to find the ground truth. Usually, image quality is being assessed by full reference metrics, like MSE (Mean Square Error) and PSNR (Peak Signal to Noise Ratio). In contrast to MSE and PSNR, recently, two more full reference metrics SSIM (Structured Similarity Indexing Method) and FSIM (Feature Similarity Indexing Method) are developed with a view to compare the structural and feature similarity measures between restored and original objects on the basis of perception. This paper is mainly stressed on comparing different image quality metrics to give a comprehensive view. Experimentation with these metrics using benchmark images is performed through denoising for different noise concentrations. All metrics have given consistent results. However, from representation perspective, SSIM and FSIM are normalized, but MSE and PSNR are not; and from semantic perspective, MSE and PSNR are giving only absolute error; on the other hand, SSIM and PSNR are giving perception and saliency-based error. So, SSIM and FSIM can be treated more understandable than the MSE and PSNR.
Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker et al.
This paper presents a novel large-scale dataset and comprehensive baselines for end-to-end pedestrian detection and person recognition in raw video frames. Our baselines address three issues: the performance of various combinations of detectors and recognizers, mechanisms for pedestrian detection to help improve overall re-identification (re-ID) accuracy and assessing the effectiveness of different detectors for re-ID. We make three distinct contributions. First, a new dataset, PRW, is introduced to evaluate Person Re-identification in the Wild, using videos acquired through six synchronized cameras. It contains 932 identities and 11,816 frames in which pedestrians are annotated with their bounding box positions and identities. Extensive benchmarking results are presented on this dataset. Second, we show that pedestrian detection aids re-ID through two simple yet effective improvements: a cascaded fine-tuning strategy that trains a detection model first and then the classification model, and a Confidence Weighted Similarity (CWS) metric that incorporates detection scores into similarity measurement. Third, we derive insights in evaluating detector performance for the particular scenario of accurate person re-ID.
Sivaramakrishnan Rajaraman, Sameer Antani, Mahdieh Poostchi, Kamolrat Silamut et al.
parasites transmitted through the bite of female Anopheles mosquito. Microscopists commonly examine thick and thin blood smears to diagnose disease and compute parasitemia. However, their accuracy depends on smear quality and expertise in classifying and counting parasitized and uninfected cells. Such an examination could be arduous for large-scale diagnoses resulting in poor quality. State-of-the-art image-analysis based computer-aided diagnosis (CADx) methods using machine learning (ML) techniques, applied to microscopic images of the smears using hand-engineered features demand expertise in analyzing morphological, textural, and positional variations of the region of interest (ROI). In contrast, Convolutional Neural Networks (CNN), a class of deep learning (DL) models promise highly scalable and superior results with end-to-end feature extraction and classification. Automated malaria screening using DL techniques could, therefore, serve as an effective diagnostic aid. In this study, we evaluate the performance of pre-trained CNN based DL models as feature extractors toward classifying parasitized and uninfected cells to aid in improved disease screening. We experimentally determine the optimal model layers for feature extraction from the underlying data. Statistical validation of the results demonstrates the use of pre-trained CNNs as a promising tool for feature extraction for this purpose.
Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang et al.
Recent research on super-resolution has achieved great success due to the development of deep convolutional neural networks (DCNNs). However, super-resolution of arbitrary scale factor has been ignored for a long time. Most previous researchers regard super-resolution of differentscale factors as independent tasks. They train a specific model for each scale factor which is inefficient in computing, and prior work only take the super-resolution of several integer scale factors into consideration. In this work,we propose a novel method called Meta-SR to firstly solve super-resolution of arbitrary scale factor (including non-integer scale factors) with a single model. In our Meta-SR,the Meta-Upscale Module is proposed to replace the traditional upscale module. For arbitrary scale factor, the Meta-Upscale Module dynamically predicts the weights of the up-scale filters by taking the scale factor as input and use these weights to generate the HR image of arbitrary size. For any low-resolution image, our Meta-SR can continuously zoomin it with arbitrary scale factor by only using a single model.We evaluated the proposed method through extensive experiments on widely used benchmark datasets on single image super-resolution. The experimental results show the superiority of our Meta-Upscale.
Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which often suffer from a limited number of parts and heavy computational cost. In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. Specifically, TASN consists of 1) a trilinear attention module, which generates attention maps by modeling the inter-channel relationships, 2) an attention-based sampler which highlights attended parts with high resolution, and 3) a feature distiller, which distills part features into an object-level feature by weight sharing and feature preserving strategies. Extensive experiments verify that TASN yields the best performance under the same settings with the most competitive approaches, in iNaturalist-2017, CUB-Bird, and Stanford-Cars datasets.
Shanto Rahman, Md. Mostafijur Rahman, M. Abdullah‐Al‐Wadud, Golam Dastegir Al-Quaderi et al.
Due to the limitations of image-capturing devices or the presence of a non-ideal environment, the quality of digital images may get degraded. In spite of much advancement in imaging science, captured images do not always fulfill users’ expectations of clear and soothing views. Most of the existing methods mainly focus on either global or local enhancement that might not be suitable for all types of images. These methods do not consider the nature of the image, whereas different types of degraded images may demand different types of treatments. Hence, we classify images into several classes based on the statistical information of the respective images. Afterwards, an adaptive gamma correction (AGC) is proposed to appropriately enhance the contrast of the image where the parameters of AGC are set dynamically based on the image information. Extensive experiments along with qualitative and quantitative evaluations show that the performance of AGC is better than other state-of-the-art techniques.
Rajib Kumar Halder, Mohammed Nasir Uddin, Md. Ashraf Uddin, Sunil Aryal et al.
Abstract The k-Nearest Neighbors (kNN) method, established in 1951, has since evolved into a pivotal tool in data mining, recommendation systems, and Internet of Things (IoT), among other areas. This paper presents a comprehensive review and performance analysis of modifications made to enhance the exact kNN techniques, particularly focusing on kNN Search and kNN Join for high-dimensional data. We delve deep into 31 kNN search methods and 12 kNN join methods, providing a methodological overview and analytical insight into each, emphasizing their strengths, limitations, and applicability. An important feature of our study is the provision of the source code for each of the kNN methods discussed, fostering ease of experimentation and comparative analysis for readers. Motivated by the rising significance of kNN in high-dimensional spaces and a recognized gap in comprehensive surveys on exact kNN techniques, our work seeks to bridge this gap. Additionally, we outline existing challenges and present potential directions for future research in the domain of kNN techniques, offering a holistic guide that amalgamates, compares, and dissects existing methodologies in a coherent manner. Graphical Abstract
Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella et al.
Convolutional Neural Networks have rapidly become the most successful machine-learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations, and model parameters. The resulting scalability in performance, power efficiency, and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool that enables design-space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets, and a specific precision. We introduce formalizations of resource cost functions and performance predictions and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS F1, demonstrating new unprecedented measured throughput at 50 TOp/s on AWS F1 and 5 TOp/s on embedded devices.
Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian et al.
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex variations. In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. Specifically, we present Gaussian Temporal Awareness Networks (GTAN) - a new architecture that novelly integrates the exploitation of temporal structure into an one-stage action localization framework. Technically, GTAN models the temporal structure through learning a set of Gaussian kernels, each for a cell in the feature maps. Each Gaussian kernel corresponds to a particular interval of an action proposal and a mixture of Gaussian kernels could further characterize action proposals with various length. Moreover, the values in each Gaussian curve reflect the contextual contributions to the localization of an action proposal. Extensive experiments are conducted on both THUMOS14 and ActivityNet v1.3 datasets, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GTAN achieves 1.9% and 1.1% improvements in mAP on testing set of the two datasets.
Zhaohui Liang, Andrew J. Powell, Ilker Ersoy, Mahdieh Poostchi et al.
Malaria is a major global health threat. The standard way of diagnosing malaria is by visually examining blood smears for parasite-infected red blood cells under the microscope by qualified technicians. This method is inefficient and the diagnosis depends on the experience and the knowledge of the person doing the examination. Automatic image recognition technologies based on machine learning have been applied to malaria blood smears for diagnosis before. However, the practical performance has not been sufficient so far. This study proposes a new and robust machine learning model based on a convolutional neural network (CNN) to automatically classify single cells in thin blood smears on standard microscope slides as either infected or uninfected. In a ten-fold cross-validation based on 27,578 single cell images, the average accuracy of our new 16-layer CNN model is 97.37%. A transfer learning model only achieves 91.99% on the same images. The CNN model shows superiority over the transfer learning model in all performance indicators such as sensitivity (96.99% vs 89.00%), specificity (97.75% vs 94.98%), precision (97.73% vs 95.12%), F1 score (97.36% vs 90.24%), and Matthews correlation coefficient (94.75% vs 85.25%).
Mahbuba Begum, Mohammad Shorif Uddin
Digital image authentication is an extremely significant concern for the digital revolution, as it is easy to tamper with any image. In the last few decades, it has been an urgent concern for researchers to ensure the authenticity of digital images. Based on the desired applications, several suitable watermarking techniques have been developed to mitigate this concern. However, it is tough to achieve a watermarking system that is simultaneously robust and secure. This paper gives details of standard watermarking system frameworks and lists some standard requirements that are used in designing watermarking techniques for several distinct applications. The current trends of digital image watermarking techniques are also reviewed in order to find the state-of-the-art methods and their limitations. Some conventional attacks are discussed, and future research directions are given.
Md. Eshmam Rayed, S. M. Sajibul Islam, Sadia Islam Niha, Jamin Rahman Jim et al.
Image segmentation, a crucial process of dividing images into distinct parts or objects, has witnessed remarkable advancements with the emergence of deep learning (DL) techniques. The use of layers in deep neural networks, like object form recognition in higher layers and basic edge identification in lower layers, has markedly improved the quality and accuracy of image segmentation. Consequently, DL using picture segmentation has become commonplace, video analysis, facial recognition, etc. Grasping the applications, algorithms, current performance, and challenges are crucial for advancing DL-based medical image segmentation. However, there’s a lack of studies delving into the latest state-of-the-art developments in this field. Therefore, this survey aimed to thoroughly explore the most recent applications of DL-based medical image segmentation, encompassing an in-depth analysis of various commonly used datasets, pre-processing techniques and DL algorithms. This study also investigated the state-of-the-art advancement done in DL-based medical image segmentation by analyzing their results and experimental details. Finally, this study discussed the challenges and future research directions of DL-based medical image segmentation. Overall, this survey provides a comprehensive insight into DL-based medical image segmentation by covering its application domains, model exploration, analysis of state-of-the-art results, challenges, and research directions—a valuable resource for multidisciplinary studies.
Md. Sabbir Ejaz, Md. Rabiul Islam, Md Sifatullah, Ananya Sarker
This paper represents an implementation of Principal Component Analysis (PCA) on masked and non-masked face recognition. Security is an essential term in our today's life. In various Biometric technology, face recognition is widely used to secure any system because it is better than any other traditional techniques like PIN, password, fingerprint etc. and most reliable to identify or verify a person efficiently. In recent years, face recognition is a very challenging task because of different occlusion or masks like the existence of sunglasses, scarves, hats and different types of make-up or disguise ingredients. The accuracy rate of face recognition is influenced by these types of masks. Many algorithms have been developed recently for non-masked face recognition which are widely used and give better performance. Still in the field of masked face recognition, few contributions has been done. Therefore, in this work a statistical procedure has been selected which is applied in non-masked face recognition and also apply in the masked face recognition technique. PCA is more effective and successful statistical technique and widely used. For this reason in this work, PCA algorithm has been chosen. Finally, a comparative study also done here for a better understanding.
Mst. Alema Khatun, Mohammad Abu Yousuf, Sabbir Ahmed, Md. Zia Uddin et al.
Human Activity Recognition (HAR) systems are devised for continuously observing human behavior - primarily in the fields of environmental compatibility, sports injury detection, senior care, rehabilitation, entertainment, and the surveillance in intelligent home settings. Inertial sensors, e.g., accelerometers, linear acceleration, and gyroscopes are frequently employed for this purpose, which are now compacted into smart devices, e.g., smartphones. Since the use of smartphones is so widespread now-a-days, activity data acquisition for the HAR systems is a pressing need. In this article, we have conducted the smartphone sensor-based raw data collection, namely H-Activity, using an Android-OS-based application for accelerometer, gyroscope, and linear acceleration. Furthermore, a hybrid deep learning model is proposed, coupling convolutional neural network and long-short term memory network (CNN-LSTM), empowered by the self-attention algorithm to enhance the predictive capabilities of the system. In addition to our collected dataset (H-Activity), the model has been evaluated with some benchmark datasets, e.g., MHEALTH, and UCI-HAR to demonstrate the comparative performance of our model. When compared to other models, the proposed model has an accuracy of 99.93% using our collected H-Activity data, and 98.76% and 93.11% using data from MHEALTH and UCI-HAR databases respectively, indicating its efficacy in recognizing human activity recognition.We hope that our developed model could be applicable in the clinical settings and collected data could be useful for further research.
Mohammad Ashfak Habib, Mas Sahidayana Mohktar, Shahrul Bahyah Kamaruzzaman, Kheng Seang Lim et al.
This paper presents a state-of-the-art survey of smartphone (SP)-based solutions for fall detection and prevention. Falls are considered as major health hazards for both the elderly and people with neurodegenerative diseases. To mitigate the adverse consequences of falling, a great deal of research has been conducted, mainly focused on two different approaches, namely, fall detection and fall prevention. Required hardware for both fall detection and prevention are also available in SPs. Consequently, researchers' interest in finding SP-based solutions has increased dramatically over recent years. To the best of our knowledge, there has been no published review on SP-based fall detection and prevention. Thus in this paper, we present the taxonomy for SP-based fall detection and prevention solutions and systematic comparisons of existing studies. We have also identified three challenges and three open issues for future research, after reviewing the existing articles. Our time series analysis demonstrates a trend towards the integration of external sensing units with SPs for improvement in usability of the systems.
Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu et al.
Temporal action proposal generation is an important task, aiming to localize the video segments containing human actions in an untrimmed video. In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information. First, we propose to use a bilinear matching model to exploit the rich local information within the video sequence. Afterwards, two components, namely segment proposal producer (SPP) and frame actionness producer (FAP), are combined to perform the task of temporal action proposal at two distinct granularities. SPP considers the whole video in the form of feature pyramid and generates segment proposals from one coarse perspective, while FAP carries out a finer actionness evaluation for each video frame. Our proposed MGG can be trained in an end-to-end fashion. Through temporally adjusting the segment proposals with fine-grained information based on frame actionness, MGG achieves the superior performance over state-of-the-art methods on the public THUMOS-14 and ActivityNet-1.3 datasets. Moreover, we employ existing action classifiers to perform the classification of the proposals generated by MGG, leading to significant improvements compared against the competing methods for the video detection task.
Zhaofan Qiu, Ting Yao, Chong‐Wah Ngo, Xinmei Tian et al.
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported.
Nadeem Ahmed, Jahir Ibna Rafiq, Md. Rashedul Islam
Human activity recognition (HAR) techniques are playing a significant role in monitoring the daily activities of human life such as elderly care, investigation activities, healthcare, sports, and smart homes. Smartphones incorporated with varieties of motion sensors like accelerometers and gyroscopes are widely used inertial sensors that can identify different physical conditions of human. In recent research, many works have been done regarding human activity recognition. Sensor data of smartphone produces high dimensional feature vectors for identifying human activities. However, all the vectors are not contributing equally for identification process. Including all feature vectors create a phenomenon known as 'curse of dimensionality'. This research has proposed a hybrid method feature selection process, which includes a filter and wrapper method. The process uses a sequential floating forward search (SFFS) to extract desired features for better activity recognition. Features are then fed to a multiclass support vector machine (SVM) to create nonlinear classifiers by adopting the kernel trick for training and testing purpose. We validated our model with a benchmark dataset. Our proposed system works efficiently with limited hardware resource and provides satisfactory activity identification.
Prajoy Podder, Tanvir Zaman Khan, Mamdudul Haque Khan, Mezbahur Rahman
In the emerging field of medical image processing, computer vision, pattern recognition and other digital signal processing applications, window technique is vastly used. A window function is a mathematical function that is zero-valued outside of some chosen interval. When another function is multiplied by a window function, the product is also zero-valued outside the interval. In this paper, the performance of Hamming, Hanning and Blackman window have been mainly compared considering their magnitude response, phase response, equivalent noise bandwidth, sidelobe transition width, response in time and frequency domain using MATLAB simulation. To observe the responses, a FIR filter of low pass, high pass, band pass and band stop type have been designed and encountered them with each parameters stated above. The results that have been found is as same as its to be as stated in the theory. Comparing simulation results of different window, this paper has found Blackman window with best performance among them which is also expected from the theory. These windows have also been encountered with speech signal using MATLAB simulation and found the same expected result.
Mohaimenul Azam Khan Raiaan, Sadman Sakib, Nur Mohammad Fahad, Abdullah Al Mamun et al.
Convolutional Neural Network (CNN) is a prevalent topic in deep learning (DL) research for their architectural advantages. CNN relies heavily on hyperparameter configurations, and manually tuning these hyperparameters can be time-consuming for researchers, therefore we need efficient optimization techniques. In this systematic review, we explore a range of well used algorithms, including metaheuristic, statistical, sequential, and numerical approaches, to fine-tune CNN hyperparameters. Our research offers an exhaustive categorization of these hyperparameter optimization (HPO) algorithms and investigates the fundamental concepts of CNN, explaining the role of hyperparameters and their variants. Furthermore, an exhaustive literature review of HPO algorithms in CNN employing the above mentioned algorithms is undertaken. A comparative analysis is conducted based on their HPO strategies, error evaluation approaches, and accuracy results across various datasets to assess the efficacy of these methods. In addition to addressing current challenges in HPO, our research illuminates unresolved issues in the field. By providing insightful evaluations of the merits and demerits of various HPO algorithms, our objective is to assist researchers in determining a suitable method for a particular problem and dataset. By highlighting future research directions and synthesizing diversified knowledge, our survey contributes significantly to the ongoing development of CNN hyperparameter optimization.