Replies: 2 comments 3 replies
-
Thanks @kuatroka this is a good point. Thanks for pointing out that some of the files are of a different format that doesn't contain XML. I agree, for the time being the file processing can be skipped for files that are not in XML form. I have mostly been testing out data from recent quarters. Interested to hear why you think this is important. |
Beta Was this translation helpful? Give feedback.
-
Hi @briancaffey. Here is the example of the code that parses non-xml 13F filing. This code only works for the most of txt based filings from Berkshire Hathaway. Unfortunately, other companies have different variations and that's the problem. Somehow these variations have to be accounted for. Just remember that I'm not a good dev, so this code is veeeeery childish, but some of it works :)
and rename the output .json for each new .txt file in the line 212 so they don't get overwritten
Code
|
Beta Was this translation helpful? Give feedback.
-
Hi Brian. I need to wrap my head around a proper explanation of the new features and more importantly explain the why of them. I'll try to do it tomorrow and meanwhile, what I'd like to comment on is what you have already mentioned in the issue's thread.
You said the code treats all the downloaded filings files in the same way - by parsing through the XML structure.
The big problem is that not all files have XML structure. In fact, the most of them don't. Somewhere between 1999 and 2012 it's just a .txt files with no helpful tags to extract tables at all. It only began to be applied sometime in 2012/2013. Unfortunately, I don't have the exact time when XML became a mandatory requirement by the SEC.
You can check it by downloading/looking at any 13F file from 1999 or 2000.
This is one of the big hurdles (in my opinion).
Another complexity is that even these non-xml files differ among themselves depending on the company filing them.
For now, I think a good idea could be to identify those files that are XML and non-XML so your parsing code gets applied only to the XML files and doesn't error our when sees the old format and instead sends a message or a log entry - "old file" or something like that.
It's a big topic and I'll share more ideas on it and why it's important ( imho) to somehow parse the old data too.
Beta Was this translation helpful? Give feedback.
All reactions