Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional curves 'UNKNOWN:1' added and phantom columns made #332

Closed
bezova opened this issue May 10, 2020 · 2 comments
Closed

Additional curves 'UNKNOWN:1' added and phantom columns made #332

bezova opened this issue May 10, 2020 · 2 comments
Assignees
Labels
documentation Anything that calls for improvements to the Sphinx docs

Comments

@bezova
Copy link

bezova commented May 10, 2020

when I use lasio.read(file, ignore_data=True) the list of returned curves is correct, (see file below), however then the full file is read using lasio.read(file) the curves list has additional mnemonics

'UNKNOWN:1', 'UNKNOWN:2'

and data columns and shape is all messed (reading data with more columns when there actually are)

I am not sure and guess this is because some columns have strings (date) but what worries me is the way it returns different curve-set depending on ignore_data and no errors generated.

also the warning is generated on the first run, (but only once)

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison   arr[arr == null] = np.nan

How to reproduce:

lasio.__version__ = '0.25.1.dev22+g07d61ef'
Python 3.7.7 

las file (attached: it is from public source):
Bramblett_1-28_WRD_4H_LAS_Standard.las.zip

Output
lasio.read(file, ignore_data=True).keys()
['MD', 'INCL', 'AZIM', 'TVD', 'VSEC', 'ND', 'ED', 'DLS', 'QI', 'DI', 'CI', 'TEMP', 'RUNNO', 'TOOLTYPE', 'CLOSURE', 'ATAZIM', 'COURSE', 'DTVD', 'ROT', 'ROB', 'GX', 'GY', 'GZ', 'BX', 'BY', 'BZ', 'TIME', 'DATE']

lasio.read(file, ignore_data=False).keys()
['MD', 'INCL', 'AZIM', 'TVD', 'VSEC', 'ND', 'ED', 'DLS', 'QI', 'DI', 'CI', 'TEMP', 'RUNNO', 'TOOLTYPE', 'CLOSURE', 'ATAZIM', 'COURSE', 'DTVD', 'ROT', 'ROB', 'GX', 'GY', 'GZ', 'BX', 'BY', 'BZ', 'TIME', 'DATE', 'UNKNOWN:1', 'UNKNOWN:2']
data is shifted by 2 cols

@bezova bezova changed the title [bug] wrong additional curves 'UNKNOWN:1', added and fantom columns made additional curves 'UNKNOWN:1', added and fantom columns made May 11, 2020
@kinverarity1
Copy link
Owner

kinverarity1 commented May 11, 2020

Thanks for the error report! You are right, the main reason this is occurring is because there is text in the ~ASCII section, which is not part of the LAS 2 specification.

However I'm glad to say lasio can still read in these files and report the correct set of curves.

It is not clear from the documentation, but what is happening is a combination of a few things lasio does automatically to reduce the number of errors occurring. Firstly, any additional columns it finds in the data section are added as curves UNKNOWN:1, and so on. In your case lasio parses the data section incorrectly, splitting the second to last column (TIME) on each hyphen. It is doing this because of the "read policy" 'run-on(-)', which is intended to handle errors that commonly occur for large numbers in fixed-width LAS files. This read policy is enabled by default:

lasio/lasio/defaults.py

Lines 73 to 82 in b89bf3b

READ_POLICIES = {
"default": ["comma-decimal-mark", "run-on(-)", "run-on(.)", "run-on(NaN.)"]
}
READ_SUBS = {
"comma-decimal-mark": [(re.compile(r"(\d),(\d)"), r"\1.\2")],
"run-on(-)": [(re.compile(r"(\d)-(\d)"), r"\1 -\2")],
"run-on(.)": [(re.compile(r"-?\d*\.\d*\.\d*"), " NaN NaN ")],
"run-on(NaN.)": [(re.compile(r"NaN[\.-]\d+"), " NaN NaN ")],
}

To turn it off, you can manually specify the read policies which should be used:

>>> las = lasio.read(
...     "Bramblett_1-28_WRD_4H_LAS_Standard.las", 
...     read_policy=("comma-decimal-mark", "run-on(.)", "run-on(NaN.)")
... )
... 
>>> las.curves
[CurveItem(mnemonic=MD, unit=ft, value=, descr=Station Depth, original_mnemonic=MD, data.shape=(175,)),
 CurveItem(mnemonic=INCL, unit=deg, value=, descr=Hole inclination, original_mnemonic=INCL, data.shape=(175,)),
 CurveItem(mnemonic=AZIM, unit=deg, value=, descr=Measured Azimuth, original_mnemonic=AZIM, data.shape=(175,)),
 CurveItem(mnemonic=TVD, unit=ft, value=, descr=True Vertical Depth, original_mnemonic=TVD, data.shape=(175,)),
 CurveItem(mnemonic=VSEC, unit=ft, value=, descr=Vertical Section, original_mnemonic=VSEC, data.shape=(175,)),
 CurveItem(mnemonic=ND, unit=ft, value=, descr=North Departure, original_mnemonic=ND, data.shape=(175,)),
 CurveItem(mnemonic=ED, unit=ft, value=, descr=East Departure, original_mnemonic=ED, data.shape=(175,)),
 CurveItem(mnemonic=DLS, unit=deg/100ft, value=, descr=Dog Leg Severity, original_mnemonic=DLS, data.shape=(175,)),
 CurveItem(mnemonic=QI, unit=, value=:                                                        Survey quality, descr=GOOD or BAD versus criteria, original_mnemonic=QI, data.shape=(175,)),
 CurveItem(mnemonic=DI, unit=, value=, descr=Survey Description Index, original_mnemonic=DI, data.shape=(175,)),
 CurveItem(mnemonic=CI, unit=, value=, descr=Survey Station Correction Index, original_mnemonic=CI, data.shape=(175,)),
 CurveItem(mnemonic=TEMP, unit=degF, value=, descr=Direction and Inclination Sensor Temperature, original_mnemonic=TEMP, data.shape=(175,)),
 CurveItem(mnemonic=RUNNO, unit=, value=, descr=Run Number, original_mnemonic=RUNNO, data.shape=(175,)),
 CurveItem(mnemonic=TOOLTYPE, unit=, value=, descr=Survey Tool Type, original_mnemonic=TOOLTYPE, data.shape=(175,)),
 CurveItem(mnemonic=CLOSURE, unit=ft, value=, descr=Total Displacement (of survey measure point from vertical), original_mnemonic=CLOSURE, data.shape=(175,)),
 CurveItem(mnemonic=ATAZIM, unit=deg, value=, descr=Closure Azimuth, original_mnemonic=ATAZIM, data.shape=(175,)),
 CurveItem(mnemonic=COURSE, unit=ft, value=, descr=Course Length, original_mnemonic=COURSE, data.shape=(175,)),
 CurveItem(mnemonic=DTVD, unit=ft, value=, descr=Delta True Vertical Depth, original_mnemonic=DTVD, data.shape=(175,)),
 CurveItem(mnemonic=ROT, unit=deg/100ft, value=, descr=Rate of Turn, original_mnemonic=ROT, data.shape=(175,)),
 CurveItem(mnemonic=ROB, unit=deg/100ft, value=, descr=Rate of Build, original_mnemonic=ROB, data.shape=(175,)),
 CurveItem(mnemonic=GX, unit=ft/s2, value=, descr=Gravity X-Axis Reading, Stationary, original_mnemonic=GX, data.shape=(175,)),
 CurveItem(mnemonic=GY, unit=ft/s2, value=, descr=Gravity Y-Axis Reading, Stationary, original_mnemonic=GY, data.shape=(175,)),
 CurveItem(mnemonic=GZ, unit=ft/s2, value=, descr=Gravity Z-Axis Reading, Stationary, original_mnemonic=GZ, data.shape=(175,)),
 CurveItem(mnemonic=BX, unit=nT, value=, descr=Magnetic X-Axis Reading, Stationary, original_mnemonic=BX, data.shape=(175,)),
 CurveItem(mnemonic=BY, unit=nT, value=, descr=Magnetic Y-Axis Reading, Stationary, original_mnemonic=BY, data.shape=(175,)),
 CurveItem(mnemonic=BZ, unit=nT, value=, descr=Magnetic Z-Axis Reading, Stationary, original_mnemonic=BZ, data.shape=(175,)),
 CurveItem(mnemonic=TIME, unit=, value=, descr=Survey Time(hh-mm-ss), original_mnemonic=TIME, data.shape=(175,)),
 CurveItem(mnemonic=DATE, unit=, value=, descr=Survey Date(DD-MMM-YYYY), original_mnemonic=DATE, data.shape=(175,))]
>>> las.data[:2, :]
array([['0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '28.0',
        '0.0', '-999.25', '-999.25', '-999.25', 'TIP', '0.0', '90.0',
        '-999.25', '-999.25', '-999.25', '-999.25', '-999.25', '-999.25',
        '-999.25', '-999.25', '-999.25', '-999.25', '10-15-18',
        '30-Apr-2013'],
       ['100.0', '0.32', '228.86', '100.0', '0.28', '-0.18', '-0.21',
        '0.32', '9.0', '0.0', '-999.25', '-999.25', 'Run1', 'GyroTool',
        '0.28', '228.86', '100.0', '100.0', '-131.14', '0.32', '-999.25',
        '-999.25', '-999.25', '-999.25', '-999.25', '-999.25',
        '04-41-50', '03-May-2013']], dtype='<U32')

There is very little lasio documentation around read_policy, so I will leave this issue open as a reminder to improve it. The relevant sections are:

@kinverarity1 kinverarity1 changed the title additional curves 'UNKNOWN:1', added and fantom columns made additional curves 'UNKNOWN:1', added and phantom columns made May 11, 2020
@kinverarity1 kinverarity1 added the documentation Anything that calls for improvements to the Sphinx docs label May 11, 2020
@kinverarity1
Copy link
Owner

Note to myself: the documentation also needs to clearly identify and explain why the list of curves may change depending on whether ignore_data is True or False, and how the user can strictly control that if they wish - i.e. how to direct to lasio to "trust the ~Curves definition and do not attempt to automatically understand what is in the data section".

@kinverarity1 kinverarity1 changed the title additional curves 'UNKNOWN:1', added and phantom columns made Additional curves 'UNKNOWN:1' added and phantom columns made Apr 9, 2021
@dcslagel dcslagel self-assigned this Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Anything that calls for improvements to the Sphinx docs
Projects
None yet
Development

No branches or pull requests

3 participants