Md Farhan Ishmam, Md Sakib Hossain Shovon, M. F. Mridha, Nilanjan Dey
The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded from datasets focusing on an extensive collection of natural...
Md Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat, Md. Hamjajul Ashmafee et al.
Can Visual Question Answering (VQA) systems maintain their performance when deployed in the real world? Or are they susceptible to realistic corruption effects, e.g., image blur, which can be detrimental in sensitive applications such as medical VQA? While linguistic robustness has been thoroughly e...
Deeparghya Dutta Barua, Md Sakib Ul Rahman Sourove, Md Fahim, Fabiha Haider et al.
Ishmam Ahmed Ongshu, Ahmed Wasif Reza, Md. Emad Uddin Aksir, Md Aftab Alam et al.
Advances in networking technology have made Distributed Denial of Service (DDoS) attacks a real danger to today’s networks. Using logical reasoning, the network flow circumstances may be classified as an attack or a routine state to mimic DDoS detection. This research builds an Artificial Intelligen...
Md Farhan Ishmam, Md Sakib Hossain Shovon, M. F. Mridha, Nilanjan Dey
The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded from datasets focusing on an extensive collection of natural...
Md Fahim, Md Sakib Ul Rahman, Akm Moshiur Rahman, Md Farhan Ishmam et al.
The advanced multimodal processing of current vision language models (VLMs) has prompted rigorous benchmarking across multicultural settings, revealing a clear inclination toward Western culture. While the bias likely stems from the predominance of Western-centric images in the VLM pretraining data,...
A K M Sarwar, Md Riyadh Arefin, M Farhan Ishmam, M Ashrafuzzaman
Obesity, a global health issue affecting 650 million people, leads to chronic diseases and health impairments. Anti-obesity drugs are expensive and may cause side effects, raising significant concerns. One hundred eighty-eight medicinal plant species from 157 genera and 62 families in Bangladesh exh...
Md Fahim, Md Farhan Ishmam, Mir Sazzat Hossain, M Ashraful Amin et al.
Pre-trained vision-language models (VLMs) such as CLIP exhibit strong generalization but struggle with few-shot adaptation due to the trade-off between gaining task-specific knowledge and preserving general performance. While multimodal adapters add trainable modules that improve alignment and excel...
Sameer Shafayet Latif, Sadab Shiper, K. M. Rahiduzzaman Kiran, Md Farhan Ishmam et al.
While the current generation of Vision Language Models (VLMs) has excelled in ideal conditions, their performance drops significantly when exposed to realistic multimodal corruptions, such as blurry images and grammatically incorrect text. Our work addresses this by establishing a novel multimodal c...