Contributors:
- Harshit Gupta
- Pratyaksh Gautam
The goal of Exemplar-Guided Paraphrase Generation (EGPG) is to produce a target sentence
that matches the style of the provided exemplar while preserving the source sentence's content
information.
In an effort to learn a better representation of the style and the substance, this study makes a
novel approach suggestion. The recent success of contrastive learning, which has proven its
effectiveness in unsupervised feature extraction tasks, is the key driving force behind this
approach.
Designing two contrastive losses with regard to the content and style while taking into
account two problem features during training is the idea.
Paper Citation: https://arxiv.org/pdf/2109.01484.pdf
In order to train and assess our models, we use two datsets. As follows:
- ParaNMT Dataset: Using back translation of the original English sentences from a different challenge, they were created automatically.
- QQPos: Compared to the dataset above, the QQPos Dataset is more formal.
We use 93k sentences for training, 3k sentences for validation and 3k sentences for testing from both the datasets each
contrastive_loss.py
Vectorized and Optimized code for style and content loss.exemplar_gen.py
Code for generating exemplar sentences using the two datasets.final_model.py
Full detailed implementation of the model and other relevant functions in Pytorch.model_nll_loss.py
Code of model with NLL Loss only.paranmt-txt-to-csv.py
Code to convert txt file to csv format (you can handle the datasets as you wish).Project Report.pdf
Final report and analysis of the results and metrics. Also qualitative analysis of generated sentences on the test set.Project Presentation.pdf
Slides dispalying the methodology and results. Similar to project report.