Repository logo
Communities
Research Outputs
Projects
Researchers
Statistics
Feedback
  1. Home
  2. CRIS
  3. Publications
  4. Contrastive Steering Vectors for Autoencoder Explainability
Details

Contrastive Steering Vectors for Autoencoder Explainability

Journal
Electronics
ISSN
2079-9292
Publisher
MDPI
Date Issued
2025
Author(s)
González Mora, José Guillermo
Facultad de Ingeniería - CampCM  
Ponce, Hiram  
Facultad de Ingeniería - CampCM  
Martinez-Villaseñor, Lourdes  
Facultad de Ingeniería - CampCM  
Type
text::journal::journal article
DOI
10.3390/electronics14183586
URL
https://scripta.up.edu.mx/handle/20.500.12552/12450
Abstract
Generative models, particularly autoencoders, often function as black boxes, making it challenging for non-expert users to effectively control the generation process and understand how inputs affect outputs. Existing methods for improving interpretability and control frequently require specific model training regimes or labeled data, limiting their applicability. This work introduces a novel approach to enhance the controllability and explainability of generative models, specifically tested on autoencoders with entangled latent spaces. We propose using a semi-supervised contrastive learning setup to learn steering vectors. These vectors, when added to an input’s latent representation, effectively manipulate specific attributes in the generated output without conditional training of the model or attribute classifiers, thus being applicable to pretrained models and avoiding compound classification errors. Furthermore, we leverage these learned steering vectors to interpret and explain the decoding process of a target attribute, allowing for efficient exploration of feature dimension interactions and the construction of an interpretable plot of the generative process, while lowering scalability limitations of perturbation-based Explainable AI (XAI) methods by reducing the search space. Our method provides an efficient pathway to controllable generation, offers an interpretable result of the model’s internal mechanisms, and relates the interpretations to human-understandable explanation questions. ©The authors ©MDPI AG ©Electronics.
Subjects

Interpretability

Explainable AI

Steering vectors

Contrastive learning

Attribute manipulatio...

Image generative mode...

License
Acceso Abierto
URL License
https://creativecommons.org/licenses/by-nc-sa/4.0/
How to cite
González Mora, J. G., Ponce, H., & Martínez-Villaseñor, L. (2025). Contrastive Steering Vectors for Autoencoder Explainability. Electronics, 14(18), 3586. https://doi.org/10.3390/electronics14183586
Table of contents
Abstract -- Introduction -- Related Work -- Methodology -- Discussion -- Conclusions and Future Work -- Author Contributions -- Funding -- Data Availability Statement -- Conflicts of Interest.

Creación y actualización de perfiles en Scripta+

Hosting & Support by

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Accessibility settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify