-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RabbitInAHat fails to load a custom model if BOM is set #411
Comments
@BillCM it is possible to attach the CSV file that causes the problem to this issue (or a stripped down one, as long as it causes the same problem). This can can save me some time when building a test case. Thanks, |
@janblom Correct. CSV exported from Excel have the Byte Order Mark set. The only way to make RabbitInAHat to read the file is to remove the BOM by changing the encoding. Perhaps this it worth a note in the docs? |
Hi @BillCM , thank you for reporting this issue. I have prepared a fix already which adds flexibility, so that RabbitInAHat can read CSV's with and without a BOM. This will be part of the upcoming 1.0 release. (Unfortunately testing another aspect of that release is taking some time. ) Since this issue will be fixed, it is not necessary to update the docs. This issue will serve as the (temporary) documentation until the fix is released, and the issue closed. (the fix is in my employers public repo until I have it approved and merged into the OHDSI repo). If possible, could you attach a CSV to this issue that I can use to reproduce the bug? While I am fairly confident that the upcoming fix will cover your case, there is nothing better than having the certainty :-) Thanks, |
@janblom I think this very issue is causing the build to break. It appears that the embedded CSVs for CDM5.0 and CDM5.1 and their stem models are all being identified as Excel encoded with BOM. This is causing the mvn build to fail for main branch on my machine.
Upon opening the code in IntelliJ, the 4 files in question are linked to the Excel icon and will not open for editing. After converting the files to UTF-8, the build works. |
I am unable to reproduce the last report, both on Linux and MacOS. Could it be that the csv files related to this were inadvertedly changed? I suspect an encoding problem (setting in your machine, such as locale) but I am unable to verify that. Since this is very likely not related to the issue reported here first, I will not investigate this further in this context. If you do think this is a problem of the WhiteRabbit project, please report this in a separate issue. It is in any case not related to the first problem reported in this issue (I was able to confirm that). The original problem is now fixed in the |
A fix for the first issue reported in this thread is included with the second release candidate of version 1.0.0 |
* Create release 1.0.0 * Enforce consistent ordering of the tables in the scan report (solves issue #236) * Snowflake: always use database and schema when accessing table (meta)data. Fixes issue #409 * Update Snowflake JDBC version and activate+fix Snowflake integration tests * Upgrade dependency, testcontainer version and fix MSSqlServer integration test. * Only run Snowflake integration tests when a Snowflake access configuartion is available * Switch to SQL for obtaining field metadata for Snowflake (default, JDBC can still be used through a system property or env.var) * Fix for #411 (can't process custom models with UTF8 BOM in csv file) * Better method naming and clearer logging for SnowflakeHandler * Add UTF BOM handling code reading of csv's * Change to ojdbc8 version 19.23.0.0 (for Oracle). Different (sub)repo, more recently published, solves issue #415 * Avoid testing results for integration test with externally loaded BigQuery JDBC jar: makes setup more simple
Seolved in WhiteRabbit version 1.0.0 |
Describe the bug
I created a custom model in Excel (XLSX) and exported to CSV. This file failed to load and resulted in this error
java.lang.IllegalArgumentException: Mapping for table not found, expected one of [table, field, required, type, schema, description] at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:121) at org.ohdsi.rabbitInAHat.dataModel.Database.generateModelFromCSV(Database.java:117) at org.ohdsi.rabbitInAHat.RabbitInAHatMain.doSetTargetCustom(RabbitInAHatMain.java:465) at org.ohdsi.rabbitInAHat.RabbitInAHatMain.lambda$createMenuBar$9(RabbitInAHatMain.java:268)
The problem is that CSV parsing does not account for the Byte Order Mark (BOM).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
CSV opens correctly.
Workaround
Open the exported CSV in a text editor and change the encoding from "UTF-8 with BOM" to "UTF-8"
Desktop (please complete the following information):
Additional context
The issue
The text was updated successfully, but these errors were encountered: