Text Reconstruction from Contextualized Embeddings (Kai Kugler, Simon Münker & Johannes Höhmann)
Text Reconstruction from Contextualized Embeddings
Kai Kugler, Simon Münker & Johannes Höhmann
Betreuung: Prof. Dr. Achim Rettinger
State of the art deep learning models in natural language processing are considered as black boxes, which capture syntactic and semantic patterns of an input sequence and transform these into complex numerical representations (embeddings). While those representations are ideal for further processing and downstream tasks, they are often criticized for not being directly human interpretable.
The idea of this thesis is to take advantage of this flaw: Given the assumption, that it is not possible or too computational expensive to reconstruct the original text from an embedding, researchers would possibly be able to make their accordingly transformed corpora publicly available, without committing copyright infringement. Investigating if, and under which conditions, this assumption holds true, will be the core focus of this work.