-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending styles parsing and RegEx search #52
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
54ac34d
Adding regex search function to the RegexMatch and adding toggle flag…
b74f343
Extending style parsing for an element from inline class in head>styl…
7638aa0
Removed trailing whitespaces.
37187c2
Fixed nonetype issue if style tag is not there in head
41fda5c
changed to affirmative condition first
a6f2a8e
style fix
4531f3a
Merge branch 'master' into master
lukehsiao 2107c62
extending current styles if exists
7e2977b
add a test case
446440f
Merge branch 'master' of https://github.com/Prabh06/fonduer
d435623
reverted to original path for test_spacy_integration
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
<?xml version="1.0" encoding="iso-8859-1"?> | ||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | ||
<meta charset="utf-8"> | ||
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> | ||
<head> | ||
<style> | ||
.row-header{ | ||
background: #f1f1f1; | ||
} | ||
.col-header{ | ||
background: #f1f1f1; | ||
color: aquamarine; | ||
font-size: 18px; | ||
} | ||
.cell{ | ||
text-align: center; | ||
} | ||
</style> | ||
</head> | ||
<body> | ||
<h1>Types of viruses, coughs, and colds</h1> | ||
<p>Here is<br/>a line break</p> | ||
<p>I don't have <span>Brain Cancer</span>or the hiccups</p> | ||
<h1><span><p>See Table 1</p> Below.</span></h1> | ||
<h2>Common Ailments</h2> | ||
<table> | ||
<tbody animal="donkey"> | ||
<tr></tr> | ||
<tr hobbies="run:fast;jump:high" letter="Q" > | ||
<th class="col-header" type="phenotype" hobbies="work:hard;play:harder" >Disease</th> | ||
<th class="col-header" day="Monday">Location</th> | ||
<th class="col-header">Year</th> | ||
</tr> | ||
<tr> | ||
<th class="row-header">Polio and BC546 is <span>−</span>55<span>O</span>C cold.</th> | ||
<td class="cell" style="width:53pt"><p class="s6" style="padding-top: 1pt">-<span class="s5">Dublin to Milwaukee</span></p></td> | ||
<td class="cell">2001</td> | ||
</tr> | ||
<tr> | ||
<th> | ||
<table> | ||
<tr> | ||
<td class="row-header"> I don't like TIPL761 or Chicken Pox or pizza. Shingles is also bad. </td> | ||
</tr> | ||
</table> | ||
</th> | ||
<td class="cell">whooping cough</td> | ||
<td class="cell">2009</td> | ||
</tr> | ||
<tr> | ||
<th class="row-header">Scurvy</th> | ||
<td class="cell">Annapolis</td> | ||
<td class="cell"> Junction and Storage Temperature −55 to 150 o ? C</td> <!--dash is u'/u2212'--> | ||
</tr> | ||
</tbody> | ||
<caption> | ||
Table 1: Infectious diseases and where to find them. | ||
</caption> | ||
</table> | ||
<p> In between the tables there is a nasty case of heart attack </p> | ||
<table> | ||
<tbody> | ||
<tr> | ||
<th class="col-header">Problem</th> | ||
<th class="col-header">Cause</th> | ||
<th class="col-header">Cost</th> | ||
</tr> | ||
<tr> | ||
<th class="row-header">Arthritis</th> | ||
<td class="cell">Pokemon Go</td> | ||
<td class="cell">Free</td> | ||
</tr> | ||
<tr> | ||
<th class="row-header">Yellow<i>Fever</i></th> | ||
<td class="cell">Unicorns</td> | ||
<td class="cell">$17.75</td> | ||
</tr> | ||
<tr> | ||
<th class="row-header">Hypochondria</th> | ||
<td class="cell">Fear</td> | ||
<td class="cell">$100</td> | ||
</tr> | ||
</tbody> | ||
<caption> | ||
Table 2: Three ways to get Pneumonia and how much they cost. | ||
</caption> | ||
</table> | ||
<p> And here is a final sentence with warts. </p> | ||
</body> | ||
</html> |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have both of these flags? It seems like these should never both be true. Only one or the other would be true at one time, if I understand correctly.
I would prefer just having
self.search
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.full_match
is to toggle appending$
to regexEg:
This is happening because $ matches the end of the string but the expression can be part of the span not at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then this looks good to me, thanks!