Options
Vatsa, Mayank
Loading...
Preferred name
Vatsa, Mayank
Alternative Name
Vatsa, M.
Main Affiliation
Web Site
ORCID
Scopus Author ID
55908650100
Researcher ID
I-5050-2013
Now showing 1 - 10 of 20
- PublicationSynthProv: Interpretable Framework for Profiling Identity Leakage(2024)
;Jaisidh Singh ;Harshil Bhatia; ; Aparna BharatiGenerative Adversarial Networks (GANs) can generate hyperrealistic face images of synthetic identities based on a latent understanding of real images from a large training set. Despite their proficiency, the term "synthetic identity"remains ambiguous, and the uniqueness of the faces GANs produce is rarely assessed. Recent studies have found that identities from the training data can unintentionally appear in the faces generated by StyleGAN2, but the cause of this phenomenon is unclear. In this work, we propose a novel framework, SynthProv, that utilizes the improved interpolation ability of StyleGAN2 latent space and employs image composition to analyze leakage. This is the first method that goes beyond detection and traces the source or provenance of constituent identity signals in the generated image. Experiments show that SynthProv succeeds in both detection and provenance tasks using multiple matching strategies. We identify identities from FFHQ and CelebA-HQ training datasets with the highest leakage into the latent space as "leaking reals". Analyzing latent space behavior to evaluate generative model privacy via leakage is an important research direction, as undetected leaking reals pose a significant threat to training data privacy. Our code is available at https://github.com/jaisidhsingh/SynthProv. - PublicationLow-Quality Deepfake Detection via Unseen Artifacts(2024)
;Saheb Chhabra ;Kartik Thakral ;Surbhi Mittal; The proliferation of manipulated media over the Internet has become a major source of concern in recent times. With the wide variety of techniques being used to create fake media, it has become increasingly difficult to identify such occurrences. While existing algorithms perform well on the detection of such media, limited algorithms take the impact of compression into account. Different social media platforms use different compression factors and algorithms before sharing such images and videos, which amplifies the issues in their identification. Therefore, it has become imperative that fake media detection algorithms work well for data compressed at different factors. To this end, the focus of this article is detecting low-quality fake videos in the compressed domain. The proposed algorithm distinguishes real images and videos from altered ones by using a learned visibility matrix, which enforces the model to see unseen imperceptible artifacts in the data. As a result, the learned model is robust to loss of information due to data compression. The performance is evaluated on three publicly available datasets, namely Celeb-DF, FaceForensics, and FaceForensics++, with three manipulation techniques, viz., Deepfakes, Face2Face, and FaceSwap. Experimental results show that the proposed approach is robust under different compression factors and yields state-of-the-art performance on the FaceForensics++ and Celeb-DF datasets with 97.14% classification accuracy and 74.45% area under the curve, respectively. - PublicationBirdCollect: A Comprehensive Benchmark for Analyzing Dense Bird Flock Attributes(2024)
;Kshitiz . ;Sonu Shreshtha ;Bikash Dutta ;Muskan Dosi; ; ;Saket Anand ;Sudeep SarkarSevaram Mali PariharAutomatic recognition of bird behavior from long-term, uncontrolled outdoor imagery can contribute to conservation efforts by enabling large-scale monitoring of bird populations. Current techniques in AI-based wildlife monitoring have focused on short-term tracking and monitoring birds individually rather than in species-rich flocks. We present BirdCollect, a comprehensive benchmark dataset for monitoring dense bird flock attributes. It includes a unique collection of more than 6,000 high-resolution images of Demoiselle Cranes (Anthropoides virgo) feeding and nesting in the vicinity of Khichan region of Rajasthan. Particularly, each image contains an average of 190 individual birds, illustrating the complex dynamics of densely populated bird flocks on a scale that has not previously been studied. In addition, a total of 433 distinct pictures captured at Keoladeo National Park, Bharatpur provide a comprehensive representation of 34 distinct bird species belonging to various taxonomic groups. These images offer details into the diversity and the behaviour of birds in vital natural ecosystem along the migratory flyways. Additionally, we provide a set of 2,500 point-annotated samples which serve as ground truth for benchmarking various computer vision tasks like crowd counting, density estimation, segmentation, and species classification. The benchmark performance for these tasks highlight the need for tailored approaches for specific wildlife applications, which include varied conditions including views, illumination, and resolutions. With around 46.2 GBs in size encompassing data collected from two distinct nesting ground sets, it is the largest birds dataset containing detailed annotations, showcasing a substantial leap in bird research possibilities. - PublicationCorruption depth: Analysis of DNN depth for misclassification(2024)
;Akshay Agarwal; ; Nalini RathaMany large and complex deep neural networks have been shown to provide higher performance on various computer vision tasks. However, very little is known about the relationship between the complexity of the input data along with the type of noise and the depth needed for correct classification. Existing studies do not address the issue of common corruptions adequately, especially in understanding what impact these corruptions leave on the individual part of a deep neural network. Therefore, we can safely assume that the classification (or misclassification) might be happening at a particular layer(s) of a network that accumulates to draw a final correct or incorrect prediction. In this paper, we introduce a novel concept of corruption depth, which identifies the location of the network layer/depth until the misclassification persists. We assert that the identification of such layers will help in better designing the network by pruning certain layers in comparison to the purification of the entire network which is computationally heavy. Through our extensive experiments, we present a coherent study to understand the processing of examples through the network. Our approach also illustrates different philosophies of example memorization and a one-dimensional view of sample or query difficulty. We believe that the understanding of the corruption depth can open a new dimension of model explainability and model compression, where in place of just visualizing the attention map, the classification progress can be seen throughout the network. - PublicationAdventures of Trustworthy Vision-Language Models: A Survey(2024)
; ;Anubhooti JainRecently, transformers have become incredibly popular in computer vision and vision-language tasks. This notable rise in their usage can be primarily attributed to the capabilities offered by attention mechanisms and the outstanding ability of transformers to adapt and apply themselves to a variety of tasks and domains. Their versatility and state-of-the-art performance have established them as indispensable tools for a wide array of applications. However, in the constantly changing landscape of machine learning, the assurance of the trustworthiness of transformers holds utmost importance. This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability. The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability. - PublicationD-LORD: DYSL-AI Database for Low-Resolution Disguised Face Recognition(2024)
;Sunny Manchanda ;Kaushik Bhagwatkar ;Kavita Balutia ;Shivang Agarwal ;Jyoti Chaudhary ;Muskan Dosi ;Chiranjeev Chiranjeev; Face recognition in a low-resolution video stream captured from a surveillance camera is a challenging problem. The problem becomes even more complicated when the subjects appearing in the video wear disguise artifacts to hide their identity or try to impersonate someone. The lack of labeled datasets restricts the current research on low-resolution face recognition systems under disguise. With this paper, we propose a large-scale database, D-LORD, that will facilitate the research on face recognition. The proposed D-LORD dataset includes high-resolution mugshot images of 2,100 individuals and 14,098 low-resolution surveillance videos, collectively containing over 1.2 million frames. Each frame in the dataset has been annotated with five facial keypoints and a single bounding box for each face. In the videos, subjects' faces are occluded by various disguise artifacts, such as face masks, sunglasses, wigs, hats, and monkey caps. To the best of our knowledge, D-LORD is the first database to address the complex problem of low-resolution face recognition with disguise variations. We also establish the benchmark results of several state-of-the-art face detectors, frame selection algorithms, face restoration, and face verification algorithms using well-structured experimental protocols on the D-LORD dataset. The research findings indicate that the Genuine Acceptance Rate (GAR) at 1% False Acceptance Rate (FAR) varies between 86.44% and 49.45% across different disguises and distances. The dataset is publicly available to the research community at https://dyslai.org/datasets/D-LORD/. - PublicationDeePhyNet: Toward Detecting Phylogeny in Deepfakes(2025)
;Kartik Thakral ;Harsh Agarwal ;Kartik Narayan ;Surbhi Mittal; Deepfakes have rapidly evolved from their inception as a niche technology into a formidable tool for creating hyper-realistic manipulated content. With the ability to convincingly manipulate videos, images, and audio, deepfake technology can be used to create fake news, impersonate individuals, or even fabricate events, posing significant threats to public trust and societal stability. The technology has already been used to generate deepfakes for a number of the above-listed applications. Extending the complexities, this paper introduces the concept of deepfake phylogeny. Currently, multiple deepfake generation algorithms can also be used sequentially to create deepfakes in a phylogenetic manner. In such a scenario, deepfake detection, ingredient model signature detection, and phylogeny sequence detection performances have to be optimized. To address the challenge of detecting such deepfakes, we propose DeePhyNet, which performs three tasks: it first differentiates between real and fake content; it next determines the signature of the generative algorithm used for deepfake creation to determine which algorithm has been used for generation, and finally, it also predicts the phylogeny of algorithms used for generation. To the best of our knowledge, this is the first algorithm that performs all three tasks together for deepfake media analysis. Another contribution of this research is the DeePhyV2 database to incorporate multiple deepfake generation algorithms including recently proposed diffusion models and longer phylogenetic sequences. It consists of 8960 deepfake videos generated using four different generation techniques. The results on multiple protocols and comparisons with state-of-the-art algorithms demonstrate that the proposed algorithm yields the highest overall classification results across all three tasks. - PublicationAssistDistil for Medical Image Segmentation(2024)
;Mahapara Khurshid ;Yasmeena Akhter; Deep learning models have demonstrated significant effectiveness in addressing intricate object segmentation and image classification tasks. Nevertheless, their widespread use is impeded by high computational demands, limiting their applicability on resource-constrained devices and in contexts like medical image segmentation. This paper proposes AssistDistil, a semi-knowledge distillation technique designed to facilitate the transfer of knowledge from a larger teacher network to a more compact student model. During the inference process, the student model works in conjunction with the teacher model by condensing the teacher model's latent information into its own latent representation, thereby boosting its representational capacity. The effectiveness of the proposed approach is demonstrated for multiple case studies in medical image segmentation task of eye segmentation, skin lesion segmentation, and chest X-ray segmentation. Experimental results on the IIITD Cataract Surgery, HAM10000, PH2, Shenzhen and Montgomery chest X-ray datasets demonstrate the efficacy of the proposed approach both in terms of accuracy and computational cost. For example, in comparison to the AUNet-based teacher model, the proposed approach achieves a similar mIOU with only 0.5% of the model size. In the future, we plan to explore knowledge distillation approaches to improve the distillation process in case of large model capacity gap between teacher and student networks. With fewer parameters, we intend for the student model to attain performance comparable to that of the teacher model without additional assistance. - PublicationOn responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare(2024)
;Surbhi Mittal ;Kartik Thakral; ; ;Tamar Glaser ;Cristian Canton FerrerTal HassnerArtificial Intelligence (AI) has seamlessly integrated into numerous scientific domains, catalysing unparalleled enhancements across a broad spectrum of tasks; however, its integrity and trustworthiness have emerged as notable concerns. The scientific community has focused on the development of trustworthy AI algorithms; however, machine learning and deep learning algorithms, popular in the AI community today, intrinsically rely on the quality of their training data. These algorithms are designed to detect patterns within the data, thereby learning the intended behavioural objectives. Any inadequacy in the data has the potential to translate directly into algorithms. In this study we discuss the importance of responsible machine learning datasets through the lens of fairness, privacy and regulatory compliance, and present a large audit of computer vision datasets. Despite the ubiquity of fairness and privacy challenges across diverse data domains, current regulatory frameworks primarily address human-centric data concerns. We therefore focus our discussion on biometric and healthcare datasets, although the principles we outline are broadly applicable across various domains. The audit is conducted through evaluation of the proposed responsible rubric. After surveying over 100 datasets, our detailed analysis of 60 distinct datasets highlights a universal susceptibility to fairness, privacy and regulatory compliance issues. This finding emphasizes the urgent need for revising dataset creation methodologies within the scientific community, especially in light of global advancements in data protection legislation. We assert that our study is critically relevant in the contemporary AI context, offering insights and recommendations that are both timely and essential for the ongoing evolution of AI technologies. - PublicationOptimizing Skin Lesion Classification Via Multimodal Data and Auxiliary Task Integration(2024)
;Mahapara Khurshid; The rising global prevalence of skin conditions, some of which can escalate to life-threatening stages if not timely diagnosed and treated, presents a significant healthcare challenge. This issue is particularly acute in remote areas where limited access to healthcare often results in delayed treatment, allowing skin diseases to advance to more critical stages. One of the primary challenges in diagnosing skin diseases is their low inter-class variations, as many exhibit similar visual characteristics, making accurate classification challenging. This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. This approach mimics the diagnostic process employed by medical professionals. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. This component plays a crucial role in refining visual details and enhancing feature extraction, leading to improved differentiation between classes and, consequently, elevating the overall effectiveness of the model. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures. The results of these experiments not only demonstrate the effectiveness of the proposed method but also its potential applicability under-resourced healthcare environments.