-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost table changes (add PERSON_ID, dates and normalize) #81
Comments
~~@cgreich next CDM meeting? ~~
changed proposal from cost_domain_concept_id to cost_source_table_concept_id |
You mean you are using numeric IDs instead of the string ones to gain performance? |
Yes. That's correct. We felt that the Integer representations leads to performance gain, makes joins efficient, and are probably the reason why OMOP CDM is so fast. I don't have benchmarks, but compared to text, pretty sure there is exponential gain. Any reason to the text? Cost tables can go to millions/billions of rows |
I agree with using Concept IDs instead of the text due to better performance. It also applies to other columns in the CDM representing an ID, like Relationship_ID, Class_ID, etc. I prefer using integer instead of text as it would follow database design best practices; however, it makes the tables less human readable before joining the tables with the description of the IDs. |
You are totally right. But: We went from numeric IDs to alphanumeric ones for the vocabulary_id, domain_id, relationship_id, concept_class_id. Reason is that it is so much easier to do manual queries (you otherwise have to constantly join the reference tables vocabulary, relationship, concept_class etc.) and these are not that many, so you can just index by them (and essentially turn them internally into sequential numerical IDs). |
changed proposal from cost_domain_concept_id to cost_source_table_concept_id @cgreich I think it is fine for the vocabulary tables because they are relatively small and don't run over billions of rows. But in the clinical tables run over billions of rows, using text as a ID is an inefficient use of the database space. Plus text dont support MPP systems - so you can't shard the data over many nodes. We need to use integers like @abedtashh said. For the users who want to use 'human readable' alphanumeric ones, they can always do: NEW
so they don't have to memorize the concept-id's, and can retain the human readable. Current
|
@cgreich @abedtashh @fdefalco what are your thoughts on 'cost_type_concept_id and remove cost type columns' From actuarial, economical and financial analytics perspective - this would really help because we can now support incurred_date, paid_date, billed_date AND we can support so many different types of cost_type's using standardized concept_id's. @cgreich would like to discuss this at next cdm vocab workgroup - if possible. |
Hi @gowthamrao you have been added to the agenda for 9/5 |
Actually, bring up both: The numerical ID and the cost. Even though I am not getting what you mean by "cost_type_concept_id and remove cost type columns" |
Will discuss them in the workgroup, but for those who read in advance. remove wide columns that represent cost types: the proposal is to convert wide to long tables. i.e. currently we have certain types of costs represented in cost table. We cant add more columns -- we need make this table long form, by creating a cost type concept-id. That way we can do have 'source_concept_id' and 'standardized_concept_id' -- and do stanardized/distributed OHDSI cost studies, AND, support local studies. |
@cgreich @abedtashh another point in favor of using integers for the *_domain_id is we are using integers for _domain_id in FACT_RELATIONSHIP table |
Why is that any better? |
@cgreich why use concept_id vs concept name? because concept_id's are integer, and concept_name are text -- and integers are more efficient |
@clairblacketer is there a way for me to edit the proposal? |
Hi @gowthamrao unfortunately not within github itself. However, you can either send me your edits and I can add them here or you can create a new issue with your edits referencing this one. |
changed proposal from cost_domain_concept_id to cost_source_table_concept_id @clairblacketer then could you please do the following changes to the proposal
cost_concept_id relies on new to be created concept_id's
|
@gowthamrao done! |
Anybody know the rationale behind leaving revenue_code/drg in cost_table? I think observation table with ability to link to visit tables and cost (via visit table) is the right place for revenue_code/drg |
@clairblacketer after discussing wtih @cgreich - want to change the proposal as a follows. Withdraw the field cost_domain_concept_id and replace with cost_record_table_concept_id. Explanation: But we need an alternative, because current cost CDM uses cost_domain_id which is as dirty as _domain_concept_id but also is inefficienct because it is a text search instead of an efficient integer search. To solve linkage with clinical-event tables (visit_occurrence_id -- cost_event_id or condition_occurrence_id -- cost_id) we are proposing a new field called cost_source_table_concept_id. This proposal is both a new field in Cost CDM, and also new vocabulary. New vocabulary: For every table in the OMOP CDM we will need to create a new concept_id. See attached. It will be a one time task of inserting into omop vocabulary with maintenance when a new table is added to the omop cdm. See proposed concepts for event_table_concept_id Advantages: The cost table will now be linkable to any clinical event table using a combination of cost_source_table_concept_id and cost_event_id - where cost_event_id is the FK to the PK of the table represented in the cost_source_table_concept_id. This will make the queries faster and cleaner. There are other advantages - e.g. cohort table. The subject_id in the cohort_table maybe the person_id, provider_id, visit_occurrence_id, visit_detail_id --- as a cohort maybe more than just a person (although we currently use it almost exclusively as a person). Adding this table concept_id (may - not part of this proposal) allow us to create cohorts that using a combination of subject_id and _source_table_concept_id is linkable to the table. |
Cost Table Changes (Add Person_id, Dates and normalize)Proposal Owner: Gowtham Rao, Chris Knoll, Klaus Bonadt Discussion: forum post Cost table description: COST Proposal overview:
1. Add Person_id to Cost table
2. Add billed_date, paid_date:Add two new date fields in the cost table. billed_date and paid_date Use cases: Analytic questions: Importance: Consequence of doing it: Consequences of not doing it: 3. Change Cost table structure - from wide to tall by using cost_concept_id and cost_source_concept_idProposed structure of new table is below 4. Add new field event_table_concept_idThis field will allow cost table to be linked to any OMOP CDM table for which cost is being represented. cost_table_concept_id will be used to infer the source of the cost information thru vocabulary look-up and then joining by the respective pk. e.g. if the event_table_concept_id points to visit_occurrence table, then cost_event_id = visit_occurrence_id. See attached concepts spreadsheet with proposed new OMOP concepts
|
Whenever possible the description for '_concept_id' fields should include the vocabulary or domain to aid in the ETL getting the correct concepts. Description for event_source_table_concept_id should state that this is for a new set of concepts, and since the proposal cannot be accepted with out the concepts being defined can also give the new vocabulary id. BTW, would not 'event_table_concept_id 'be a better name. When I see '_source_concept_id' I then look for a '_concept_id (e.g. drug_concept_id, drug_source_concept_id pairing) If the cost_concept_id is to in reference to a new set of concept, proposal should say that and again provide the vocabulary name. I also think the new concepts need a definition for those of us that are not so closely tied to the payer space but still need to implement the ETL. It was fairly simple at one time because most of the column name in claim tables matched the names in the cost table, but that does not apply to EHR systems. Also is 'Paid by all payers' a summary, or is this a value that is likely to exist in a claims database? Are we moving to always use date time? cost_source_concept_id: I do not understand this column, assuming it is the 'source' that corresponds to the cost_concept_id, what concepts are these going to point to? In proposed concepts Concepts.xlsx, I think the domain should be 'Metadata' instead of 'Type Concept. |
Recap from the Oct 3rd 2017 CDM workgroup, forums and symposium discussion
|
@don-torok |
@don-torok is this what you were recommending? Cost concepts |
Yes and no (: Co-payment is the fixed amount that is paid for a covered health care service. This payment is usually in addition to deductible. Also known as 'copays'. Copayments may differ based on services within the same plan, like drugs, lab tests, and visits to specialists. It supplies a brief description, gives a couple of options for source of how the data may be described in the source documentation, co-payment or copays and it tells me it is not the same as a deductible. However, this part of the description offers little help in doing the ETL defining the attribute in the CDM. Another example of where a detailed definition needed is to distinguish between Pharmacy ingredient (the amount charged by the wholesale distributor or manufacturer ) and Average Wholesale. I would like sufficient information to help me determine if a column labeled 'Wholesale cost' should go into Pharmacy ingredient or Average Wholesale. |
Recap from CDM workgroup on Nov 7th 2017
|
@don-torok please check the updated spreadsheet here Please provide input on cost concepts being proposed. |
added in v6.0 |
I know this thread is old and closed, but I do not see the specs for the updated COST table in the V6.0 documentation, specifically here: https://ohdsi.github.io/CommonDataModel/cdm60.html#COST. @clairblacketer , @gowthamrao - is this just a mistake? Where can I get updated specs for the cost table for v6? Thanks! |
What's missing, @jenniferduryea? |
I see the problem @cgreich @jenniferduryea. There is no person_id or dates as the issue suggests. I’ll push a fix in the next few days. |
@clairblacketer and @cgreich both COST tables under the v6.0 and v5.3.1 specs are the same, even though the table has been converted from Wide to Long (unless you kept it wide? which was not my impression). The accompanying wording for ETL convention and description of the table does not reflect the "wide to long" conversion. Also, v6 still shows columns for paid_patient_copay, paid_patient_coinsurance, etc which I believe are now stored under the cost.cost_concept_id field for V6. Basically, the whole COST table has not been updated in the V6 spec from V5.3.1 to reflect all of the changes above. |
Labelled as Documentation to be fixed in the 2021 CDM Hackathon. Final Decision Fix the COST table in the current v6.0 specs and moving forward with v5.4. The correct specification can be found here. |
* Add github actions workflow to build package and run tests. * update Description file * rename .Rproj file. * Consolidate 'create' functions into one file. * Add tests for create functions. * update description * removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file. * consolidate write functions into one file. Add execute function. * update docs * add tests for write and execute functions * update documentation * Add windows and linux runners in github actions. * update github actions * download drivers before running tests * fix small error in setup test file. * debug github actions * debug github actions * debug github actions * debug github actions * fix tiny bug * comment out execute ddl test * fix bug in test * Add execute test back in * revert accidental change in description * add print statement for debugging schema error on github actions. * Fix schema environment variable name * Add comment to github actions workflow file. * remove placeholder text in function documentation. * Rename createdDdl.R to createDdl.R * Hack-a-thon updates Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264 * Changed bigint to integer for consistency * Updated DDLs * Add tests for redshift. Clean up test setup file. * Foreign key fixes * Add imports and update docs. * Fix bug in setup test script. * update setup file * Add tests for oracle and sql server. Move setup.R file. * fix bug in setup * debug tests on github * debug github actions * debug actions. * debug actions * debug actions. * Add missing secrets to yaml!! * debug actions * test connection on all platforms * add ddl execution * add windows and linux runners Co-authored-by: Adam Black <[email protected]> Co-authored-by: Clair Blacketer <[email protected]>
* Add github actions workflow to build package and run tests. * update Description file * rename .Rproj file. * Consolidate 'create' functions into one file. * Add tests for create functions. * update description * removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file. * consolidate write functions into one file. Add execute function. * update docs * add tests for write and execute functions * update documentation * Add windows and linux runners in github actions. * update github actions * download drivers before running tests * fix small error in setup test file. * debug github actions * debug github actions * debug github actions * debug github actions * fix tiny bug * comment out execute ddl test * fix bug in test * Add execute test back in * revert accidental change in description * add print statement for debugging schema error on github actions. * Fix schema environment variable name * Add comment to github actions workflow file. * remove placeholder text in function documentation. * Rename createdDdl.R to createDdl.R * Hack-a-thon updates Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264 * Changed bigint to integer for consistency * Updated DDLs * Add tests for redshift. Clean up test setup file. * Foreign key fixes * Add imports and update docs. * Fix bug in setup test script. * update setup file * Add tests for oracle and sql server. Move setup.R file. * fix bug in setup * debug tests on github * debug github actions * debug actions. * debug actions * debug actions. * Add missing secrets to yaml!! * debug actions * test connection on all platforms * add ddl execution * add windows and linux runners * Allow user to specify output location in buildRelease * replace outputpath with outputfolder for consitent argument names in the package. * Add test for buildRelease. * replace outputpath with outputfolder for consistency. update documentation. * move ddl folder to inst so it is accessible from tests * update documentation Co-authored-by: Adam Black <[email protected]> Co-authored-by: Clair Blacketer <[email protected]>
* Add github actions workflow to build package and run tests. * update Description file * rename .Rproj file. * Consolidate 'create' functions into one file. * Add tests for create functions. * update description * removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file. * consolidate write functions into one file. Add execute function. * update docs * add tests for write and execute functions * update documentation * Add windows and linux runners in github actions. * update github actions * download drivers before running tests * fix small error in setup test file. * debug github actions * debug github actions * debug github actions * debug github actions * fix tiny bug * comment out execute ddl test * fix bug in test * Add execute test back in * revert accidental change in description * add print statement for debugging schema error on github actions. * Fix schema environment variable name * Add comment to github actions workflow file. * remove placeholder text in function documentation. * Rename createdDdl.R to createDdl.R * Hack-a-thon updates Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264 * Changed bigint to integer for consistency * Updated DDLs * Add tests for redshift. Clean up test setup file. * Foreign key fixes * Add imports and update docs. * Fix bug in setup test script. * update setup file * Add tests for oracle and sql server. Move setup.R file. * fix bug in setup * debug tests on github * debug github actions * debug actions. * debug actions * debug actions. * Add missing secrets to yaml!! * debug actions * test connection on all platforms * add ddl execution * add windows and linux runners * Resolving conflicts * Removing unnecessary file * Trying again to remove .DS_Store, adding to gitignore * Allow user to specify output location in buildRelease * replace outputpath with outputfolder for consitent argument names in the package. * Add test for buildRelease. * replace outputpath with outputfolder for consistency. update documentation. * move ddl folder to inst so it is accessible from tests * update documentation * Add OMOP header genearator function Co-authored-by: Adam Black <[email protected]> Co-authored-by: Clair Blacketer <[email protected]> Co-authored-by: clairblacketer <[email protected]>
* Add github actions workflow to build package and run tests. * update Description file * rename .Rproj file. * Consolidate 'create' functions into one file. * Add tests for create functions. * update description * removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file. * consolidate write functions into one file. Add execute function. * update docs * add tests for write and execute functions * update documentation * Add windows and linux runners in github actions. * update github actions * download drivers before running tests * fix small error in setup test file. * debug github actions * debug github actions * debug github actions * debug github actions * fix tiny bug * comment out execute ddl test * fix bug in test * Add execute test back in * revert accidental change in description * add print statement for debugging schema error on github actions. * Fix schema environment variable name * Add comment to github actions workflow file. * remove placeholder text in function documentation. * Rename createdDdl.R to createDdl.R * Hack-a-thon updates Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264 * Changed bigint to integer for consistency * Updated DDLs * Add tests for redshift. Clean up test setup file. * Foreign key fixes * Add imports and update docs. * Fix bug in setup test script. * update setup file * Add tests for oracle and sql server. Move setup.R file. * fix bug in setup * debug tests on github * debug github actions * debug actions. * debug actions * debug actions. * Add missing secrets to yaml!! * debug actions * test connection on all platforms * add ddl execution * add windows and linux runners * Resolving conflicts * Removing unnecessary file * Trying again to remove .DS_Store, adding to gitignore * Allow user to specify output location in buildRelease * replace outputpath with outputfolder for consitent argument names in the package. * Add test for buildRelease. * replace outputpath with outputfolder for consistency. update documentation. * move ddl folder to inst so it is accessible from tests * update documentation * Add OMOP header genearator function Co-authored-by: Adam Black <[email protected]> Co-authored-by: Clair Blacketer <[email protected]> Co-authored-by: clairblacketer <[email protected]>
Please see Gowtham's updated Cost table proposal in the comments below
Cost Table Changes (Add Person_id, Dates and normalize)
Proposal Owner: Gowtham Rao, Chris Knoll, Klaus Bonadt
Discussion: forum post
Cost table description: COST
Proposal overview:
1. Add Person_id to Cost table
2. Add incurred_date, incurred_datetime, billed_datetime, paid_datetime:
Use cases:
Analytic questions:
Importance:
Consequence of doing it:
Consequences of not doing it:
3. Leverage cost_type_concept_id and remove cost type columns
4. Add new field cost_domain_concept_id
The text was updated successfully, but these errors were encountered: