Legal-Sents-Embs

This project is to construct sentence embeddings for legal language and analyze semantic similarity between two sentences. The code is based on the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

The code is written in Python and it needs numpy, pandas, matplotlib, scipy, pickle, sklearn, gensim.

Install

To install all dependencies, virtualenv is suggested:

$ virtualenv .env
$ . .env/bin/activate
$ pip install -r requirements.txt

Comments for Demos

In the folder examples,

Generate_Basic_Files_WE.py is to generate words file, word embeddings file, and weight4ind file from the pretrained word embeddings file, such as fastText or GloVe.
Generate_Basic_Files_Gensim.py is to generate words file, word embeddings file, and weight4ind file from the word embeddings in the gensim output format.
sif_embedding.py is to generate a sentence embedding given a sentence.
sim_sif.py is to generate sentence embeddings for the given .txt file, and outputs the Pearson's Coefficent and MSE.
generate_first_pc.py is to compute the first principal component given a set of sentences.

Get Started

The sentence embeddings is based on the pre-trained word embeddings library, such as word2vec, GloVe, or fastText.

Firstly, using Generate_Basic_Files_WE.py or Generate_Basic_Files_Gensim.py to generate three necessary files for the further step, namely, words, vectors, and weight4ind. The source file of the word embeddings libraries and the output location should be passed into the main function.

Secondly, using generate_first_pc.py to generate the first principal component of a set of sentences in the dataset.

Thirdly, using sif_embedding.py to generate a sentence embedding for the givne sentence. Or, using sim_sif.py to produce the evaluation result (Pearson's Coefficient and MSE) on the test dataset.

More details can be found in the code with the comments.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
auxiliary_data		auxiliary_data
data		data
examples		examples
score_predictor		score_predictor
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal-Sents-Embs

Install

Comments for Demos

Get Started

About

Releases

Packages

Languages

YingWang-Clare/Legal-Sents-Emb

Folders and files

Latest commit

History

Repository files navigation

Legal-Sents-Embs

Install

Comments for Demos

Get Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages