Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location table changes (add region_concept_id) #252

Closed
pavgra opened this issue Mar 13, 2019 · 10 comments
Closed

Location table changes (add region_concept_id) #252

pavgra opened this issue Mar 13, 2019 · 10 comments
Labels

Comments

@pavgra
Copy link

pavgra commented Mar 13, 2019

Location table changes (add region_concept_id)

Proposal Owner: Pavel Grafkin, Gowtham Rao

Discussion: #220, OHDSI/WebAPI#649

Proposal overview:

  • Add region_concept_id column to location table

Description

We would like to use administrative areas as part of cohort entry event, cohort inclusion criteria, enhance cohort characterization using the areas, create heatmaps and do clustering based on the areas.

To achieve this, as a first step, Geo Vocabularies were proposed (OHDSI/WebAPI#649 (comment)) and implemented (OHDSI/Vocabulary-v5.0#207). The geo vocabularies represent ontologies of administrative areas (e.g. Country -> State -> County -> Township).

The second necessary step is linkage of records in location table to the administrative areas. Even though the relations of geo concepts and locations represent derived information, not all of OHDSI supported DBs have geo capabilities (and therefore we cannot compute the relations between a location and geo concepts inside DB - OHDSI/WebAPI#649 (comment)) plus the computation of the relations is pretty compute intensive to do it on-demand, therefore there is a need of pre-calculation and physical reference storage. We need to store only a link from a location to its lower-level administrative area (concept_id). Other, higher level administrative areas, can be retrieved via concept_ancestor.

Location to administrative area pre-calculation

(using Postgres)

  1. Load a CSV with concept_id to polygon relations (OSM v1 release code Vocabulary-v5.0#207 (comment)) into the table:
CREATE TABLE area_polygon (
    concept_id INTEGER,
    polygon VARCHAR
);

(where polygon stores GeoJSON)

  1. Precalculate relations between LOCATION and lowest level areas:
CREATE INDEX area_polygon_geom ON area_polygon USING GIST (ST_SetSrid(ST_GeomFromGeoJSON(area_polygon.polygon), 4326));
CREATE INDEX location_point ON location USING GIST (ST_SetSrid(ST_MakePoint(longitude, latitude), 4326));

WITH lowest_level_areas AS (
  SELECT area.*
  FROM area_polygon area
  WHERE NOT EXISTS (SELECT * FROM concept_ancestor WHERE ancestor_concept_id = area.concept_id AND min_levels_of_separation > 0)
),
loc_area AS (
  SELECT l.location_id, area.concept_id
  FROM location l
    JOIN lowest_level_areas area ON ST_CONTAINS(ST_SetSrid(ST_GeomFromGeoJSON(area.polygon), 4326), ST_SetSrid(ST_MakePoint(l.longitude, l.latitude), 4326))
)
UPDATE location
SET region_concept_id = la.concept_id
FROM loc_area la
WHERE location.location_id = la.location_id;

Proposed implementation
#251

@cgreich
Copy link
Contributor

cgreich commented Mar 14, 2019

Nice calculation, but we will not be able to always assign the lowest level geographical information, because we may not have that detail.. E.g., if all we know the Location is Massachusetts, then the town of Rutland in Worcester County is the center point. However, the true Location could be in any other place in Massachusetts. The ETLer has to decide what level in the geo hierarchy to pick based on the information in the source data.

@pavgra
Copy link
Author

pavgra commented Mar 14, 2019

Obviously, my demo script assumes that you have precise latitude and longitude (the most accurate case). Otherwise, there may be a variety of cases requiring individual approach and custom logic during ETL.

@clairblacketer
Copy link
Contributor

This will be discussed at the 4/9/2019 CDM Meeting

@pavgra
Copy link
Author

pavgra commented Apr 10, 2019

@clairblacketer , so what are the next steps to get the appropriate changes into develop?

@clairblacketer
Copy link
Contributor

Hi @pavgra here is the link to the vote. I will keep it open until next Tuesday and then I will add it to the dev branch if it passes.

@pavgra
Copy link
Author

pavgra commented Apr 17, 2019

@clairblacketer , since this proposal received more than 90% of support, can we proceed with merging it into the development version of CDM?

@clairblacketer
Copy link
Contributor

@pavgra this was added to the development branch

@vojtechhuser
Copy link
Collaborator

please post where can I find in "dev wiki" the location table revised description (per this proposal) (not just the DDL code)

@clairblacketer
Copy link
Contributor

@vojtechhuser you caught me! I have not yet added a "dev wiki" with descriptions of additions, though it is on my list and I will post a link as soon as it is ready.

@clairblacketer
Copy link
Contributor

Closing this as these changes were proposed prior to us adding country and latitude and longitude to the location table. This can be reopened if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants