-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure only ASCII character set used #12
Conversation
ELSE | ||
PRINT *,' ' | ||
PRINT *,'Troubles, with ',problem_line_count,' lines.' | ||
PRINT *,'File uses only ISO-8859 character codes, outside the standard ASCII range of ',FIRST_VALID,' to ',LAST_VALID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "File uses only ISO-8859 character codes, outside the standard ASCII range of", perhaps something like "File uses character codes outside the standard ASCII range of"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mike,
The new print statement:
PRINT *,'File uses character codes outside the standard ASCII range of ',FIRST_VALID,' to ',LAST_VALID
EXIT big_read_loop | ||
END IF | ||
|
||
DO ind = 1 , MAX_LENGTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The \tab character is fairly prominent in our code, and has ASCII code 9 (which is outside of the 32-127 range). Should this loop include that exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coincidentally, the 9 files that I tested, none of them do have a tab character. I have now fixed the logic in the code to ignore all tab character (ASCII code #9).
! usage: | ||
! a.out < file.F | ||
|
||
PROGRAM finder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we name this something more self-descriptive? Like "nonasciifinder"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mike,
The source code now lists the main program as "non_ascii_finder"
line_count = 1 | ||
problem_line_count = 0 | ||
|
||
! Loop over eah line of the input file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"eah" -> "each"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mike,
"each" now has a "c", also added "ubiquitously" to get right back up on that spelling horse.
! Output: GA --- â(x) | ||
! Purpose: Compute the gamma function Ahat(x) | ||
! Input : x --- Argument of Ahat(x) | ||
! ( x is not equal to 0,-1,-2,WHAT GOES HERE ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravanah
What should be in that original u'u'u' string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ravan,
It was pointed out, maybe this in supposed to be "..."?
Dave
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this was probably the indication for etcetera,
Aren't you going to commit updated var/convertor/wave2grid_kma/pvchkdv.F as well? |
! line number and column count (for subsequent editing). | ||
|
||
! usage: | ||
! a.out file.F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding the following (from your commit message) in the program itself.
build the finder program: gfortran -ffree-form finder.F
a.out some-file-name.F
…excluded TYPE: bug fix KEYWORDS: ISO, ASCII, sed, byte SOURCE: internal DESCRIPTION OF CHANGES: Authors of a few physics schemes likely used a "cut-and-paste" technique for including references and for units. The offending references used quite a few different characters for an intended dash (minus sign). The offending units all used a superscript numeral 2 to mean "squared", as in W/m^2. I changed some to m^2 and some to m2, as both are used in the modified schemes. There were a few other single modifications (an "a" with a carat hat, etc). All of the changes were to commented lines. The change are necessary to allow the use of sed to process the source code. Outside of the physics directory, a number of files also had characters outside of the Fortran character set (32-127). These were all in comments, but are still being removed. LIST OF MODIFIED FILES: chem/module_cam_mam_newnuc.F chem/module_gocart_dmsemis.F chem/module_gocart_seasalt.F chem/module_mozcart_wetscav.F chem/module_sea_salt_emis.F dyn_em/module_sfs_driver.F dyn_em/module_sfs_nba.F frame/module_cpl.F hydro/Routing/module_gw_gw2d.F phys/module_bl_mfshconvpbl.F phys/module_gocart_seasalt.F phys/module_ltng_cpmpr92z.F phys/module_ltng_crmpr92.F phys/module_ltng_iccg.F phys/module_mp_nssl_2mom.F phys/module_mp_wdm6.F phys/module_sf_bem.F phys/module_sf_bep.F phys/module_sf_bep_bem.F var/convertor/wave2grid_kma/pvchkdv.F (Thanks Jamie!) TESTS CONDUCTED: The sed program works on the modified files, and does not work on the original files.
@jamiebresch and @mkavulich Can you guys review again to see if my pull request may now proceed? Thanks |
@davegill We would like to have tools/find.F renamed to tools/non_ascii_finder.F |
@davegill Would it be hard to allow the verbosity level to be specified on the command-line, and to let the user give a list of files to be scanned as command-line arguments? I'm imagining something like this:
|
Michael, a.out a.out -v a.out -v non_ascii_finder.F a.out -V non_ascii_finder.F a.out -VV fortran_2003_fflush_test.G a.out -v fortran_2003_fflush_test.F a.out -V fortran_2003_fflush_test.F a.out -VV fortran_2003_fflush_test.F Dave |
PRINT *,'where <verbose level> is -v when using this program with "find", and' | ||
PRINT *,' <verbose level> is -V when processing a single file' | ||
! PRINT *,' <verbose level> is -VV is for developers and debugging' | ||
PRINT *,'where <filename> is a WRF Fortran source file' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davegill I suppose I may be nit-picking at this point, but why does the input have to be a WRF Fortran source file? Taking a more general view, this utility tells whether there are any characters outside the set of printable ASCII characters (or characters not acceptable to cpp, or whatever). Also, stating that -v
is used when using the program with find
doesn't really help anyone to understand what the effect of using -v
actually is. Also, stating that -V
is used when processing a single file can suggest that the program can process multiple files.
Generally, we could reconsider the printout produced by this program with a broader view of what the program could potentially be used for.
@davegill It's purely academic at this point, but it might be interesting to try to detect UTF-8 multi-byte encodings. For example, in the
Really, though, the 68th and 69th characters together form a UTF-8 character; their binary encoding is This explains why the line
correctly shows the superscript 2 as a single character, but the lines
can't show any character. |
@mgduda There are tons of languages and encoding methods. For the source code purpose, I think ASCII-only is a good rule. |
@jamiebresch Agreed. To be clear, I was definitely not suggesting that we allow anything but printable ASCII characters in the source code (I think this may even be part of the Fortran standard); rather, I was only saying that, because UTF-8 can be used to encode more or less every language, and it is by far the largest encoding used on e.g., the web, that the checker program could be more clever and recognize multi-byte UTF-8 encodings for what they are, rather than printing two, three, or four error messages for the same multi-byte character. My previous comment only came about because I noticed that some of the messages from the checker referenced two characters, when I could only find one in the source code (e.g., the superscript 2), and I started looking further and thought the UTF-8 encoding bit was pretty cool. |
Below is a script version of a UTF-8 -> ASCII character converter. Craig #!/bin/sh @(#) utf2ascii Convert UTF-8 to ASCII text################################################ Convert file encoded in UTF-8 to ASCII text.Usage: utf2ascii filename################################################ ############################### Set trap to abort on signal############################### ##################################### Process command-line argument(s):##################################### exit
|
…rt_registry minor inconsequential removal of extra quote on memetum preturbations…
Synching up namelist templates
No description provided.