Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?

Fuster Baggetto, Alejandro

Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?

Fuster Baggetto, Alejandro. (2022). Is Anisotropy Really the Cause of BERT Embeddings not being Semantic? Master Thesis, Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial

Ficheros (Some files may be inaccessible until you login with your old.e-spacio credentials)
Nombre			Descripción	Tipo MIME		Size
Fuster_Baggetto_Alejandro_TFM.pdf			Fuster_Baggetto_Alejandro_TFM.pdf		application/pdf	1.66MB

Título	Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?
Autor(es)	Fuster Baggetto, Alejandro
Abstract	We conduct a set of experiments aimed to improve our understanding of the lack of semantic isometry (correspondence between the embedding and meaning spaces) of contextual word embeddings of BERT. Our empirical results show that, contrary to popular belief, the anisotropy is not the root cause of the poor performance of these contextual models’ embeddings in semantic tasks. What does affect both anisotropy and semantic isometry are a set of biased tokens, that distort the space with non semantic information. For each bias category (frequency, subword, punctuation, and case), we measure its magnitude and the effect of its removal. We show that these biases contribute but not completely explain the anisotropy and lack of semantic isometry of these models. Therefore, we hypothesise that the finding of new biases will contribute to the objective of correcting the representation degradation problem. Finally, we propose a new similarity method aimed to smooth the negative effect of biased tokens in semantic isometry and to increase the explainability of semantic similarity scores. We conduct an in depth experimentation of this method, analysing its strengths and weaknesses and propose future applications for it.
Notas adicionales	Trabajo de Fin de Máster Universitario en Investigación en Inteligencia Artificial. UNED
Materia(s)	Ingeniería Informática
Palabra clave	semantic textual similarity sentence embeddings transformers natural language processing deep learning
Editor(es)	Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial
Director/Tutor	Fresno Fernández, Víctor
Fecha	2022-09-01
Formato	application/pdf
Identificador	bibliuned:master-ETSInformatica-IIA-Afuster http://e-spacio.uned.es/fez/view/bibliuned:master-ETSInformatica-IIA-Afuster
Idioma	eng
Versión de la publicación	acceptedVersion
Nivel de acceso y licencia	http://creativecommons.org/licenses/by-nc-nd/4.0 info:eu-repo/semantics/openAccess
Tipo de recurso	master Thesis
Tipo de acceso	Acceso abierto

Tipo de documento:	master Tesis
Collections:	Máster Universitario en Investigación en Inteligencia Artificial Set de openaire Set de items trabajo fin de máster

Contador de citas:	Search Google Scholar
Estadísticas de acceso:	215 Visitas, 171 Descargas - Estadísticas en detalle
Creado:	Thu, 14 Sep 2023, 20:52:56 CET

old.e-spacio

Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?