Skip to content

Commit

Permalink
sync with Google drive files
Browse files Browse the repository at this point in the history
  • Loading branch information
shun-lin committed Oct 20, 2019
1 parent 6168608 commit 2cd9a21
Show file tree
Hide file tree
Showing 21 changed files with 237,741 additions and 4,402 deletions.
1 change: 1 addition & 0 deletions BERT_Evaluation.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions ColabTesting.ipynb

Large diffs are not rendered by default.

3,588 changes: 1 addition & 3,587 deletions Data Processing.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions GPT.ipynb

Large diffs are not rendered by default.

Binary file added GPT_output_jokes.docx
Binary file not shown.
1 change: 1 addition & 0 deletions GPT_with_validation.ipynb

Large diffs are not rendered by default.

1,500 changes: 1,500 additions & 0 deletions LM_output_jokes.txt

Large diffs are not rendered by default.

1,500 changes: 1,500 additions & 0 deletions LSTM_output_jokes.txt

Large diffs are not rendered by default.

Binary file added Project Lion Poster.pdf
Binary file not shown.
Binary file added Project Lion Project Report.docx
Binary file not shown.
2 changes: 1 addition & 1 deletion QA Jokes.ipynb

Large diffs are not rendered by default.

815 changes: 1 addition & 814 deletions Simple RNN Test.ipynb

Large diffs are not rendered by default.

44 changes: 44 additions & 0 deletions capita.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Capita is a small library that preprocesses capitalization from text
# and can undo this pre-processing (through the unprocess_capitalization).
# It is used in the Summarization notebook, to process raw text into words which can then be tokenized and numerized.
# You do not need to modify this file.


from segtok import tokenizer

def preprocess_capitalization(text):
words = tokenizer.word_tokenizer(text)
final_words = []
for word in words:
if not word.isalpha():
final_words.append(word.lower())
else:
if word.islower():
pass
elif word.isupper():
final_words.append("⇧")
elif word[0].isupper() and word[1:].islower():
final_words.append("↑")
else:
final_words.append("↑")

final_words.append(word.lower())
return " ".join(final_words)

def unprocess_capitalization(text):
words = text.split(" ")
final_words = []
all_caps = False; capitalized = False
for w in words:
if w == "⇧": all_caps = True
elif w == "↑": capitalized = True
else:
final_word = w
if all_caps: final_word = final_word.upper()
elif capitalized:
if len(final_word) <= 1: final_word = final_word.upper()
else: final_word = final_word[0].upper()+final_word[1:]
final_words.append(final_word)
all_caps = False; capitalized = False

return " ".join(final_words)
231,657 changes: 231,657 additions & 0 deletions jokes.txt

Large diffs are not rendered by default.

Loading

0 comments on commit 2cd9a21

Please sign in to comment.