Options
Vatsa, Mayank
Loading...
Preferred name
Vatsa, Mayank
Alternative Name
Vatsa, M.
Main Affiliation
Web Site
ORCID
Scopus Author ID
55908650100
Researcher ID
I-5050-2013
Now showing 1 - 7 of 7
- PublicationAssistDistil for Medical Image Segmentation(2024)
;Mahapara Khurshid ;Yasmeena Akhter; Deep learning models have demonstrated significant effectiveness in addressing intricate object segmentation and image classification tasks. Nevertheless, their widespread use is impeded by high computational demands, limiting their applicability on resource-constrained devices and in contexts like medical image segmentation. This paper proposes AssistDistil, a semi-knowledge distillation technique designed to facilitate the transfer of knowledge from a larger teacher network to a more compact student model. During the inference process, the student model works in conjunction with the teacher model by condensing the teacher model's latent information into its own latent representation, thereby boosting its representational capacity. The effectiveness of the proposed approach is demonstrated for multiple case studies in medical image segmentation task of eye segmentation, skin lesion segmentation, and chest X-ray segmentation. Experimental results on the IIITD Cataract Surgery, HAM10000, PH2, Shenzhen and Montgomery chest X-ray datasets demonstrate the efficacy of the proposed approach both in terms of accuracy and computational cost. For example, in comparison to the AUNet-based teacher model, the proposed approach achieves a similar mIOU with only 0.5% of the model size. In the future, we plan to explore knowledge distillation approaches to improve the distillation process in case of large model capacity gap between teacher and student networks. With fewer parameters, we intend for the student model to attain performance comparable to that of the teacher model without additional assistance. - PublicationOn responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare(2024)
;Surbhi Mittal ;Kartik Thakral; ; ;Tamar Glaser ;Cristian Canton FerrerTal HassnerArtificial Intelligence (AI) has seamlessly integrated into numerous scientific domains, catalysing unparalleled enhancements across a broad spectrum of tasks; however, its integrity and trustworthiness have emerged as notable concerns. The scientific community has focused on the development of trustworthy AI algorithms; however, machine learning and deep learning algorithms, popular in the AI community today, intrinsically rely on the quality of their training data. These algorithms are designed to detect patterns within the data, thereby learning the intended behavioural objectives. Any inadequacy in the data has the potential to translate directly into algorithms. In this study we discuss the importance of responsible machine learning datasets through the lens of fairness, privacy and regulatory compliance, and present a large audit of computer vision datasets. Despite the ubiquity of fairness and privacy challenges across diverse data domains, current regulatory frameworks primarily address human-centric data concerns. We therefore focus our discussion on biometric and healthcare datasets, although the principles we outline are broadly applicable across various domains. The audit is conducted through evaluation of the proposed responsible rubric. After surveying over 100 datasets, our detailed analysis of 60 distinct datasets highlights a universal susceptibility to fairness, privacy and regulatory compliance issues. This finding emphasizes the urgent need for revising dataset creation methodologies within the scientific community, especially in light of global advancements in data protection legislation. We assert that our study is critically relevant in the contemporary AI context, offering insights and recommendations that are both timely and essential for the ongoing evolution of AI technologies. - PublicationLow-Quality Deepfake Detection via Unseen Artifacts(2024)
;Saheb Chhabra ;Kartik Thakral ;Surbhi Mittal; The proliferation of manipulated media over the Internet has become a major source of concern in recent times. With the wide variety of techniques being used to create fake media, it has become increasingly difficult to identify such occurrences. While existing algorithms perform well on the detection of such media, limited algorithms take the impact of compression into account. Different social media platforms use different compression factors and algorithms before sharing such images and videos, which amplifies the issues in their identification. Therefore, it has become imperative that fake media detection algorithms work well for data compressed at different factors. To this end, the focus of this article is detecting low-quality fake videos in the compressed domain. The proposed algorithm distinguishes real images and videos from altered ones by using a learned visibility matrix, which enforces the model to see unseen imperceptible artifacts in the data. As a result, the learned model is robust to loss of information due to data compression. The performance is evaluated on three publicly available datasets, namely Celeb-DF, FaceForensics, and FaceForensics++, with three manipulation techniques, viz., Deepfakes, Face2Face, and FaceSwap. Experimental results show that the proposed approach is robust under different compression factors and yields state-of-the-art performance on the FaceForensics++ and Celeb-DF datasets with 97.14% classification accuracy and 74.45% area under the curve, respectively. - PublicationBirdCollect: A Comprehensive Benchmark for Analyzing Dense Bird Flock Attributes(2024)
;Kshitiz . ;Sonu Shreshtha ;Bikash Dutta ;Muskan Dosi; ; ;Saket Anand ;Sudeep SarkarSevaram Mali PariharAutomatic recognition of bird behavior from long-term, uncontrolled outdoor imagery can contribute to conservation efforts by enabling large-scale monitoring of bird populations. Current techniques in AI-based wildlife monitoring have focused on short-term tracking and monitoring birds individually rather than in species-rich flocks. We present BirdCollect, a comprehensive benchmark dataset for monitoring dense bird flock attributes. It includes a unique collection of more than 6,000 high-resolution images of Demoiselle Cranes (Anthropoides virgo) feeding and nesting in the vicinity of Khichan region of Rajasthan. Particularly, each image contains an average of 190 individual birds, illustrating the complex dynamics of densely populated bird flocks on a scale that has not previously been studied. In addition, a total of 433 distinct pictures captured at Keoladeo National Park, Bharatpur provide a comprehensive representation of 34 distinct bird species belonging to various taxonomic groups. These images offer details into the diversity and the behaviour of birds in vital natural ecosystem along the migratory flyways. Additionally, we provide a set of 2,500 point-annotated samples which serve as ground truth for benchmarking various computer vision tasks like crowd counting, density estimation, segmentation, and species classification. The benchmark performance for these tasks highlight the need for tailored approaches for specific wildlife applications, which include varied conditions including views, illumination, and resolutions. With around 46.2 GBs in size encompassing data collected from two distinct nesting ground sets, it is the largest birds dataset containing detailed annotations, showcasing a substantial leap in bird research possibilities. - PublicationCorruption depth: Analysis of DNN depth for misclassification(2024)
;Akshay Agarwal; ; Nalini RathaMany large and complex deep neural networks have been shown to provide higher performance on various computer vision tasks. However, very little is known about the relationship between the complexity of the input data along with the type of noise and the depth needed for correct classification. Existing studies do not address the issue of common corruptions adequately, especially in understanding what impact these corruptions leave on the individual part of a deep neural network. Therefore, we can safely assume that the classification (or misclassification) might be happening at a particular layer(s) of a network that accumulates to draw a final correct or incorrect prediction. In this paper, we introduce a novel concept of corruption depth, which identifies the location of the network layer/depth until the misclassification persists. We assert that the identification of such layers will help in better designing the network by pruning certain layers in comparison to the purification of the entire network which is computationally heavy. Through our extensive experiments, we present a coherent study to understand the processing of examples through the network. Our approach also illustrates different philosophies of example memorization and a one-dimensional view of sample or query difficulty. We believe that the understanding of the corruption depth can open a new dimension of model explainability and model compression, where in place of just visualizing the attention map, the classification progress can be seen throughout the network. - PublicationAdventures of Trustworthy Vision-Language Models: A Survey(2024)
; ;Anubhooti JainRecently, transformers have become incredibly popular in computer vision and vision-language tasks. This notable rise in their usage can be primarily attributed to the capabilities offered by attention mechanisms and the outstanding ability of transformers to adapt and apply themselves to a variety of tasks and domains. Their versatility and state-of-the-art performance have established them as indispensable tools for a wide array of applications. However, in the constantly changing landscape of machine learning, the assurance of the trustworthiness of transformers holds utmost importance. This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability. The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability. - PublicationD-LORD: DYSL-AI Database for Low-Resolution Disguised Face Recognition(2024)
;Sunny Manchanda ;Kaushik Bhagwatkar ;Kavita Balutia ;Shivang Agarwal ;Jyoti Chaudhary ;Muskan Dosi ;Chiranjeev Chiranjeev; Face recognition in a low-resolution video stream captured from a surveillance camera is a challenging problem. The problem becomes even more complicated when the subjects appearing in the video wear disguise artifacts to hide their identity or try to impersonate someone. The lack of labeled datasets restricts the current research on low-resolution face recognition systems under disguise. With this paper, we propose a large-scale database, D-LORD, that will facilitate the research on face recognition. The proposed D-LORD dataset includes high-resolution mugshot images of 2,100 individuals and 14,098 low-resolution surveillance videos, collectively containing over 1.2 million frames. Each frame in the dataset has been annotated with five facial keypoints and a single bounding box for each face. In the videos, subjects' faces are occluded by various disguise artifacts, such as face masks, sunglasses, wigs, hats, and monkey caps. To the best of our knowledge, D-LORD is the first database to address the complex problem of low-resolution face recognition with disguise variations. We also establish the benchmark results of several state-of-the-art face detectors, frame selection algorithms, face restoration, and face verification algorithms using well-structured experimental protocols on the D-LORD dataset. The research findings indicate that the Genuine Acceptance Rate (GAR) at 1% False Acceptance Rate (FAR) varies between 86.44% and 49.45% across different disguises and distances. The dataset is publicly available to the research community at https://dyslai.org/datasets/D-LORD/.