Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No records on conversion #8

Open
brianckeegan opened this issue Feb 20, 2018 · 3 comments
Open

No records on conversion #8

brianckeegan opened this issue Feb 20, 2018 · 3 comments

Comments

@brianckeegan
Copy link

brianckeegan commented Feb 20, 2018

I'm working with the National Highway Transportation and Safety Administration's Fatality Analysis Reporting Systems which reports its data in DBF format from 1975-2016. The data starting in 2010 ( ftp://ftp.nhtsa.dot.gov/fars/2010/DBF/FARS2010.zip ) is read without error and includes appropriate-looking Dbf5 attributes, but none of the conversion methods produce data.

import pandas as pd
from simpledbf import Dbf5
dbf = Dbf5('accident.dbf',codec='latin1')
print("Number of records (raw): {0:,}".format(_dbf.numrec))
# 29,867
print("Number of records (DataFrame): {0:,}".format(len(_dbf.to_dataframe())))
# 0

Using other methods like to_csv and other codec encodings produces the same behavior: no data produced. I've reproduced this issue on both Mac and PC on Anaconda 4 running Python 3.6 in Jupyter Notebook.

@rnelsonchem
Copy link
Owner

@brianckeegan I confirmed your observation, and I found the source of your problems.

If you look at the _get_recs method the first byte is tested to see if the record is deleted. According to the specs, that first byte should be a space. If not, it is assumed that the record was deleted.

I modified my local version of the simpledbf file to print('No rec') if that flag is triggered, and when I try to convert that particular DBF file to a DataFrame, I get a bunch of "No rec"s printed.

My suggestion would be to comment out those two lines of your local simpledbf install; however, you might want to ping the folks that made that file to find out why all of the records are marked as deleted... Hope that helps.

@rnelsonchem
Copy link
Owner

If modifying your install is not in the cards, I was able to open that file with Excel 2016, and it seemed like all of the records were there. It is interesting, though, that Excel does not give any indication that the records were marked for deletion as per the specs...

Perhaps I'll need to add a "show deleted records" flag to the code... Once we switch over to 30h days, I'll use that new time to make some updates :)

@brianckeegan
Copy link
Author

Thanks for looking into this. I don't think I'll be able to convince anyone at NHTSA to alter these data, but it would be great if there was an option when initializing in Dbf5 or using to_dataframe() to include all records, even those flagged as deleted like this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants