-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmanuscript.tex
1280 lines (987 loc) · 97.8 KB
/
manuscript.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[preprint,3p,onecolumn,times,review]{article}
%\documentclass[preprint,3p,onecolumn,times,review]{elsarticle}
%\documentclass[preprint,3p,twocolumn,times]{elsarticle}
\usepackage[cmex10]{amsmath}
\usepackage{amssymb}
\usepackage{amsfonts}
\usepackage{graphicx}
\usepackage{bussproofs}
\usepackage{booktabs}
\usepackage{soul}
\usepackage{color}
\soulregister\cite7
\soulregister\ref7
\newcommand{\removed}[1]{\st{#1}}
\usepackage{subfigure}
\usepackage{listings}
\usepackage{fancyvrb}
\usepackage{stmaryrd}
\renewcommand{\theenumi}{(\roman{enumi})}
\title{An analysis of structural interoperability \\in {ISO 13606} based on typed lambda calculus}
% \author[plan]{Akimichi Tatsukawa\corref{cor1}}
% \ead{[email protected]}
% \address[plan]{Department of Planning, Information and Management, The University of Tokyo Hospital, Tokyo, Japan.}
% \author[disease]{Emiko Shinohara}
% \address[disease]{Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.}
% \author[plan]{Takeshi Imai}
% \author[plan]{Yoshimasa Kawazoe}
% \author[plan]{Kazuhiko Ohe}
% \cortext[cor1]{Corresponding author. Address: Department of Planning, Information and Management, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan, Fax: +81 3 3813 7238\\
% \qquad Email address: [email protected]}
\newtheorem{thm}{Theorem}
\newtheorem{lem}[thm]{Lemma}
%\newdefinition{rmk}{Remark}
%\newproof{pf}{Derivation}
\newcommand{\HL}[1]{\hl{\mbox{#1}}}
\begin{document}
\maketitle
%\begin{frontmatter}
\begin{abstract}
ISO 13606 defines a standard specification for the exchange and integration of electronic health record (EHR) data with the goal of interoperability of EHR data across heterogeneous systems.
In order to achieve this goal, the ISO 13606 standard provides a dual-model architecture in which archetypes are extensible schemas of EHR data via their specialization and composition mechanisms.
However, the ISO 13606 standard lacks the formal semantics of archetypes, which makes it difficult to build EHR systems or archetype repositories in a consistent and longitudinal manner.
The goal of the present study was to clarify the archetype semantics of ISO 13606 by means of a formal methodology called typed lambda calculus, which has been widely used to define and analyze the semantics of modern programming languages and database systems.
We focus on the variance and immutability of the archetype, because these factors determine the expressive power of archetypes.
We defined a type system that represents the fundamental components of ISO 13606 semantics and analyzed the semantics based on a deductive system of the type theory.
The obtained results indicate that the archetypes should be covariant and immutable schemas in order to guarantee both the structural interoperability of EHR data and the extensibility of archetypes.
\end{abstract}
% \begin{keyword}
% ISO 13606 \sep Archetypes \sep Typed Lambda Calculus \sep Formal Semantics \sep openEHR
% \end{keyword}
%\end{frontmatter}
%\tableofcontents
\section{Introduction}
\subsection{Background}
The efficient use of medical resources and the fulfillment of large-scale clinical research requires integration of electronic healthcare record (EHR) systems that store diverse clinical data for long periods of time.
This challenge, however, remains unsolved even today.
The heterogeneity of these data sources, i.e., the interoperability of EHR data from local systems in a consistent and longitudinal manner, is a major obstacle for the integration or exchange of data across EHR systems \cite{sujansky01:_heter}.
In order to achieve interoperability, the schema of EHR data must satisfy the requirements of various organizations and evolve over time in accordance with the development of medical domains.
ISO 13606 standard defines a common healthcare model for such a purpose.
The overall goal of the standard is "to devise a generalized approach representing every conceivable kind of health record data structure in a consistent way" \cite[vi]{iso08:13606_healt_elect_part1} so as to "support the interoperability of systems and components that need to communicate EHR data" \cite[v]{iso08:13606_healt_elect_part1}.
Due to the complexity and continuing evolution of the healthcare domain, the common model must meet the needs of organizations and cope with the progress of medical domains.
In order to overcome these difficulties, ISO 13606 has adopted a dual-model approach that distinguishes two modeling layers, i.e., information layer and knowledge layer, which are explained later.
While this dual-model architecture is a major feature of ISO 13606, the original idea stems from the openEHR specification, which was also an attempt at the "sharing of EHRs via interoperability at data and knowledge levels" \cite[p.12]{beale08:_openeh_archit_overv}.
In fact, ISO 13606 is largely considered to be a subset of the full openEHR specification \cite{peter:_relationship_cen13606_hl7_openehr,chen09:_archet_conversion_ehr_model}. Therefore, both specifications share common features.
The present paper refers primarily to ISO 13606 as a standard for the interoperability of EHR systems. However, for the case in which ISO 13606 is short of sufficient descriptions and the openEHR specification is more appropriate, we will refer to the openEHR specification as a secondary resource.
As specifications become larger and more complicated, the consistency and correctness of the specifications become uncertain, and ISO 13606 is no exception.
The ISO 13606 specification describes the behavior of archetypes but does not define any formal semantics.
This is problematic because the lack of a formal definition makes the implementation of an EHR system difficult.
For example, when we define an archetype using both specialization and composition, are we allowed to combine them in an arbitrary manner? Or, are there constraints embedded in the semantics of the archetypes?
These questions have to be answered before the design and implementation of an EHR system.
Since ISO 13606 provides insufficient explanation regarding these questions, we must look back at the original openEHR specifications for clues.
In the two specifications, we found two important statements. One is cited from the ISO 13606 specification, and the other from the openEHR specification.
\begin{enumerate}
\item Assertion 1
Any data created via the use of a specialized archetype shall be conformant to both the archetype and its parent \cite[viii]{iso08:13606_healt_elect_part2}.
\item Assertion 2
Data created with any specialized archetype will always be matched by queries based on the parent archetype \cite[p.51]{beale08:_openeh_archit_overv}.
\end{enumerate}
These assertions are prerequisites for laying the groundwork on archetype semantics, because they suggest the requirements for EHR data conformance and queries based on archetype.
However, the exact meanings remain unclear because natural language descriptions leave room for a wide range of interpretations.
For example, what does the phrase 'conformant to' mean exactly? How can we assure that certain queries match appropriate EHR data and that others do not?
\subsection{Objective}
The goal of the present study is to clarify the ISO 13606 archetype semantics in order to support the two assertions above. We investigate the semantics in a formal manner under the assumption that at least these two assertions must hold, and we pursue further consequences that are logically derived from the semantics.
The term 'formal methodology' refers to the situation in which every term in a syntax or every property in semantics is defined by a symbol so that each has a precise meaning that serves as a foundation for reasoning or verification.
As a means of such formalization, we used {\bf typed lambda calculus}, which is widely used as meta-language in the field of computer science to define and analyze the semantics of various programming languages and database systems \cite{ohori89:_simpl_seman_ml_polym,igarashi01:_feath_java}.
We believe that this formalization technique can contribute to the analysis of ISO 13606 in the same manner as previous domains in computer science.
To the best of our knowledge, this is the first attempt to apply typed lambda calculus to semantics in the medical field.
\section{Material and Methods}
\subsection{Concepts for EHR data integration\label{sec:higher_concepts}}
Even if a formal methodology is necessary for building a solid semantics, it is not sufficient.
When we encounter different results from different premises, we must choose one of the results based on higher conceptual criteria, even though these results are both valid in terms of formalization.
Before introducing the formal methodology of typed lambda calculus, we begin by explaining several fundamental concepts for the exchange and integration of EHR data.
These concepts are 'structural interoperability' and 'schema extensibility'.
\subsubsection{Structural interoperability should be guaranteed}
Every standardization of clinical information, including ISO 13606, attempts to achieve a certain level of interoperability between EHR systems.
ISO 13606 defines 'interoperability' as the "ability for data shared by systems to be understood at the level of fully defined domain concepts" \cite[p.5]{iso08:13606_healt_elect_part1}.
However, the term 'to be understood' is not precise in terms of computer science. Computers can never 'understand' the meaning of data, they only compute.
Moreover, we believe that this ambiguity has its root in the fact that two different types of concepts are being explained using one term 'interoperability'.
Thus, we divided the concept of interoperability into two levels, as follows.
\begin{enumerate}
\item Structural Interoperability
Structural interoperability is defined as the ability to process data based on a shared formal schema with sound semantics.
This type of interoperability is referred to as 'structural' because modern database schemas or data models are solely concerned with the structure of data, such as field names, data types, and relations.
\item Semantic Interoperability
Semantic interoperability, in contrast, is defined as the ability to process data based on the shared meanings of terms \cite{veltman01:_syntac_seman_inter,sebastian09:_towar_seman_inter_for_elect_healt_recor,maldonado07:_framew_for_clinic_data_stand_based_archet,martinez-costa10:_iso_en_openeh,jose09:_linkeh_ed}.
In order to share the meanings of terms, we need mapping between objects in a virtual world (i.e., the computer) and concepts in the real world, because the 'meaning' refers to our interpretation of assignments between concrete objects and abstract concepts \cite[p.97]{hakimpour05:_resol_seman_heter_datab_schem}.
The concepts in the real world, however, can neither be directly represented nor processed by computer.
Therefore, the assignments are technically performed both by defining ontologies, which are surrogates for the real world concept, and by annotating terms or attributes with the corresponding items in the ontologies.
\end{enumerate}
\begin{figure*}[!htbp]
\begin{center}
\includegraphics[width=12cm]{meaning_of_semantic.eps}
\end{center}
\caption{Structural interoperability and semantic interoperability}\label{fig:meaning_of_semantic}
\end{figure*}
Figure \ref{fig:meaning_of_semantic} illustrates the difference between structural interoperability and semantic interoperability.
All three EHR systems define blood pressure in their schemas.
Systems B and C have structurally identical schemas and are therefore structurally interoperable.
System A has a different schema from the other systems, because their attribute names are written in German.
As a result, Systems A can be semantically interoperable with System B only if the mapping between the attributes and the real concepts is correctly defined.
Figure {\ref{fig:meaning_of_semantic}} appears to indicate that structural interoperability and semantic interoperability are two independent concepts.
However, structural interoperability is necessary even when semantic interoperability is to be accomplished, because semantic annotations are useful only if the valid values can be obtained without errors. For example, let us think of a situation in which we collect systolic blood pressure data both from Systems A and B in Fig. {\ref{fig:meaning_of_semantic}}. According to the ontology, we get table names (${\mathsf{BloodPressure}}$ and ${\mathsf{Blutdruck}}$) and column names (${\it systolic}$ and ${\it systolisch}$) for each system. Only when we have this structural information can we actually retrieve the data.
Thus, structural interoperability is more fundamental than semantic interoperability.
The present study focuses on the concept of structural interoperability in the investigation of archetype semantics.
\subsubsection{EHR schema should be extensible}
When every EHR system is built upon a unified and fixed EHR schema, we can easily achieve structural interoperability.
In real settings, however, building such a single giant schema is difficult, especially in the medical domain, where concepts are not only large, but also open-ended with respect to breadth, depth, and complexity \cite{rector99:_clinic_termin,renner96:_data_inter}.
The following example, which is a longitudinal cancer cohort study, illustrates the difficulties involved (Fig. \ref{fig:cohort_study}). At the start of the study, we store smoking histories as the exposure level and several tumor markers as surrogates for cancer outcome.
When a novel marker is discovered during the course of the study, we want to add the marker as an additional item, such as {\sf RevisedOutcome} in {\sf RevisedCase} (Fig. \ref{fig:cohort_study}).
Or, if an unexpected exposure, such as a nuclear plant accident, is observed, the extra exposure (in this case, an irradiation dose) should be added.
As this example indicates, when data are stored and traced longitudinally, the EHR schema must be extended as medicine progresses or a new event occurs. Otherwise, a number of disorganized schemas, many of which are similar but slightly different, will accumulate, and the cost of maintenance will increase significantly.
\begin{figure*}[!htbp]
\begin{center}
\includegraphics[width=16cm]{cancer_cohort_study_schema.eps}
\end{center}
\caption{Unified Modeling Language (UML) diagram of an imaginary cohort study}\label{fig:cohort_study}
{\footnotesize
The diagram describes the schema of an imaginary cohort study in chronological order.
The lower diagram indicates the more recent content.
}
\end{figure*}
\subsection{ISO 13606 architecture}
In this section, we present an overview of ISO 13606 in terms of the dual model architecture.
In particular, we focus on the specialization and composition of archetypes.
At the end of this section, we point out the problem associated with their combination (archetype variances).
\subsubsection{Dual-model architecture}
Like openEHR, ISO 13606, adopts a dual-model architecture (Fig. \ref{fig:dual_model_approach}) to guarantee interoperability and support flexibility of archetypes as EHR schema.
The architecture defines a Reference Model (RM) and an Archetype Model (AM) in the lower layer and archetypes in the upper layer, and it proposes building archetypes by constraining RM classes by means of an AM.
\begin{figure}[!htbp]
\begin{center}
\includegraphics[width=7cm,clip]{dual_model_approach.eps}
\end{center}
\caption{Relationships between RM, AM, and Archetype in the dual model architecture}\label{fig:dual_model_approach}
{\footnotesize
The diagram describes the dual-model architecture of ISO 13606, which consists of two layers, an information layer and a knowledge layer. The lower side of the figure indicates the information layer, in which the Reference Model and the Archetype Model are located, and the upper side indicates the knowledge layer, in which the Archetype resides.}
\end{figure}
The lower layer of the dual-model architecture contains a Reference Model and an Archetype Model.
The {\bf Reference Model} defines a set of data structures and data types from which any health record information is composed \cite[p.vi]{iso08:13606_healt_elect_part1}.
Sitting in the lower layer of the dual-model architecture, the RM represents the generic properties of health record information \cite[p.vi]{iso08:13606_healt_elect_part1} , and forms the generic building blocks of the EHR.
The {\bf Archetype Model} also sits in the lower layer, and acts as a skeleton or template for archetypes.
The model consists of "identifying information, a description (its metadata), a definition (expressed in terms of constraints on instances of an object model), and an ontology" \cite[p.vii]{iso08:13606_healt_elect_part2}.
The definition section defines the hierarchical organization of a set of nodes and constraints on the permitted values of attributes and data values \cite[p.6]{iso08:13606_healt_elect_part2}.
In other words, the AM defines the structure and constraints of EHR data from combinations of RM entities in their definition section.
In contrast to the previous models, the {\bf Archetype} resides in the upper layer of the dual-model architecture, and defines medical domains in the form of structured and constrained combinations of RM and AM entities.
While the difference between an Archetype and the AM is confusing, they reside on different layers, as shown in Fig. \ref{fig:dual_model_approach}.
The RM and AM are static because they are defined by the specification in advance, whereas Archetypes are dynamic because they can be constructed at any time as clinical practice evolves \cite[vi]{iso08:13606_healt_elect_part2}.
At this stage, the significance of the dual-model architecture is revealed.
In ordinary architectures that have only one layer of modeling, data models or a schema is already defined in a system or specification, which makes it difficult or impossible to modify or extend the schema.
In dual-model architectures, which include the basic components of schemas and the general mechanism for their construction, the actual schemas (i.e., archetypes) can be constructed dynamically from these components.
Archetypes can be serialized through a language named Archetype Definition Language (ADL) \cite[xiv]{iso08:13606_healt_elect_part2}.
While AM, RM, and Archetype are all inhabited in the layer of semantics that is invisible and abstract, ADL is inhabited in the layer of syntax that is readable and intuitive for human users.
Therefore, we presented several examples in the form of ADL fragments to help in understanding the semantics.
\subsubsection{Specialization and composition of archetypes}
The dual-model architecture enables archetypes to be defined dynamically and thus realizes schema flexibility.
Moreover, ISO 13606 provides a method by which to further enhance flexibility, namely, specialization and composition of archetypes.
First, archetypes can be specialized \cite[viii]{iso08:13606_healt_elect_part2}. New archetypes can be defined by further constraining the parent archetype. Similar to class inheritance in object-oriented languages, the child of an archetype inherits all its attributes from the parent archetype and adds new items or refines existing items in the child archetype.
Second, archetypes can be composed \cite[ix]{iso08:13606_healt_elect_part2}. A new archetype is defined by containing other archetypes when they are specified in 'slots' of the new archetype.
For example, Fig. {\ref{archetype_slot_regex} is a fragment of ADL with slots, indicating that archetypes for which the id is matched by \verb|/.*\.iso-ehr\.section\..*/| can be composed, and that archetypes of \verb|/.*\.iso-ehr\.section\.patient_details\..*/| should not be composed in the {\it{items}} attribute.
The 'composition' mechanism allows large data structures to be flexibly constrained via the hierarchical re-use of smaller archetypes \cite{beale08:_openeh_archit_overv,beale07:_archet_objec_model}.
Note that the ISO 13606 specification allows a composed archetype to be specified either by stating a unique archetype identifier, or vaguely by wildcards (i.e., matches many possible archetypes) \cite[p.61]{beale07:_archet_defin_languag} as shown in Fig. \ref{archetype_slot_regex}. This issue is discussed later herein.
\begin{figure*}[!htbp]
\begin{minipage}{1.0\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
!textsf(SECTION) [at2000] occurrences matches {0..1} matches {
!textit(items) matches {
allow_archetype !textsf(SECTION) occurrences matches {0..*} matches {
include
id matches {/.*\.iso-ehr\.section\..*/}
exclude
id matches {/.*\.iso-ehr\.section\.patient_details\..*/}
}
}
}
\end{Verbatim}
\end{minipage}
\caption{Excerpt of ADL with an archetype slot}\label{archetype_slot_regex}
\end{figure*}
\subsubsection{Variances of archetype \label{sec:variances_archetype}}
When we define an archetype by combining specialization and composition, we encounter several possibilities, referred to as variances \cite[p.26]{bruce02:_found_objec_orien_languag}\cite{castagna95:_covar}. The variance of a type is said to be covariant in its components when the type varies in the same direction as one of its parts with respect to subtyping (Fig. \ref{fig:archetype_variances}(a)). The variance of a type is said to be contravariant when the type varies in the opposite direction as one of its parts with respect to subtyping (Fig. \ref{fig:archetype_variances}(b)). Finally, the variance of a type is said to be invariant if the type is neither covariant nor contravariant (Fig. \ref{fig:archetype_variances}(c)).
We presented three alternatives of archetype variances as ADL fragments in Fig. {\ref{fig:archetype_variances_adl}} for better understanding the differences.
Note that in each example, the right-hand ADL is a specialization of the left-hand ADL.
\begin{figure*}[!htbp]
\begin{center}
\includegraphics[width=16cm]{archetype_variance.eps}
\end{center}
\caption{Three alternative archetype variances in a UML diagram}\label{fig:archetype_variances}
{\footnotesize
The figure shows three alternative archetype variances in a UML diagram. a) covariant: if $\alpha' <: \alpha$ and $\beta' <: \beta$, then $\alpha' \diamond \beta' <: \alpha \diamond \beta$, b) contravariant: if $\alpha' <: \alpha$ and $\beta' <: \beta$, then $\alpha' \diamond \beta <: \alpha \diamond \beta'$, and c) invariant: if $\alpha' <: \alpha$ and $\beta'= \beta$, then $\alpha' \diamond \beta <: \alpha \diamond \beta$.
Note that, in this figure, the operator $<:$ denotes 'is a subtype of', and $\diamond$ denotes 'is composed of'.
}
\end{figure*}
The actual variance of an archetype depends on the semantics of the archetype, i.e., how the archetype is constructed as a type. We consider this problem to be a major issue in archetype semantics because it affects the archetype expressiveness and extensibility as EHR schemas.
\begin{figure*}[!htbp]
a) covariant:
\begin{minipage}{0.49\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
archetype
ISO-EN13606-COMPOSITION.alpha.v1
definition
COMPOSITION[at0000] matches {
!textit(content) matches {
allow_archetype SECTION[at0001] matches {
include
archetype_id/value matches {
"ISO-EN13606-SECTION.beta.v1"}}}}
\end{Verbatim}
\end{minipage}
\begin{minipage}{0.49\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
archetype
ISO-EN13606-COMPOSITION.alpha-dash.v1
specialize
ISO-EN13606-COMPOSITION.alpha.v1
definition
COMPOSITION[at0000] matches {
!textit(content) matches {
allow_archetype SECTION[at0001] matches {
include
archetype_id/value matches {
"ISO-EN13606-SECTION.beta-dash.v1"}}}}
\end{Verbatim}
\end{minipage}
b) contravariant:
\begin{minipage}{0.49\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
archetype
ISO-EN13606-COMPOSITION.alpha.v1
definition
COMPOSITION[at0000] matches {
!textit(content) matches {
allow_archetype SECTION[at0001] matches {
include
archetype_id/value matches {
"ISO-EN13606-SECTION.beta-dash.v1"}}}}
\end{Verbatim}
\end{minipage}
\begin{minipage}{0.49\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
archetype
ISO-EN13606-COMPOSITION.alpha-dash.v1
specialize
ISO-EN13606-COMPOSITION.alpha.v1
definition
COMPOSITION[at0000] matches {
!textit(content) matches {
allow_archetype SECTION[at0001] matches {
include
archetype_id/value matches {
"ISO-EN13606-SECTION.beta.v1"}}}}
\end{Verbatim}
\end{minipage}
c) invariant:
\begin{minipage}{0.49\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
archetype
ISO-EN13606-COMPOSITION.alpha.v1
definition
COMPOSITION[at0000] matches {
!textit(content) matches {
allow_archetype SECTION[at0001] matches {
include
archetype_id/value matches {
"ISO-EN13606-SECTION.beta.v1"}}}}
\end{Verbatim}
\end{minipage}
\begin{minipage}{0.49\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
archetype
ISO-EN13606-COMPOSITION.alpha-dash.v1
specialize
ISO-EN13606-COMPOSITION.alpha.v1
definition
COMPOSITION[at0000] matches {
!textit(content) matches {
allow_archetype SECTION[at0001] matches {
include
archetype_id/value matches {
"ISO-EN13606-SECTION.beta.v1"}}}}
\end{Verbatim}
\end{minipage}
\caption{Examples of archetype variance in ADL}\label{fig:archetype_variances_adl}
\end{figure*}
\subsection{Typed lambda calculus}
In this section, we briefly introduce typed lambda calculus, which is the tool we use to investigate the archetype semantics. Lambda calculi are mathematical abstractions of computations, and in the typed lambda calculus, every variable must be explicitly typed.
We herein use the terms typed lambda calculus and type theory interchangeably.
\subsubsection{How type theory is applied to ISO 13606}
Before introducing the details of the typed lambda calculus, we present an overview of the correspondence between the components of typed lambda calculus and ISO 13606.
In an abstract sense, a type is a well-defined set of values, so that every model in ISO 13606 can be encoded as a type.
Because each type has its own characteristics, we must carefully decide which types can be assigned to each component of ISO 13606.
We encoded unstructured RM components as primitive types in type theory.
Not surprisingly, a string in the RM is mapped to the string type.
In contrast, structured RM components are mapped to structured types in type theory, such as record type, variant type, and list type.
For example, the TEXT type in ISO 13606 has charset, languages, and originalText as its field \cite[p.36]{iso08:13606_healt_elect_part1}.
Therefore, the TEXT type can be encoded as a record type using Eq.~\ref{eqn:text_as_record}.
Note that, in the present paper, concrete types, such as {\sf Record} or {\sf CS}, start with a capital letter and are shown in {\sf SansSerif} font.
\begin{figure*}[!htbp]
\begin{alignat}{3}
\mathsf{TEXT} \equiv \mathsf{Record}\{charset \text{:} \mathsf{CS}, languages \text{:} \mathsf{CS}, originalText \text{:} \mathsf{String} \} \label{eqn:text_as_record}
\end{alignat}
\end{figure*}
Archetypes, which are also structured, are encoded in record types.
For example, the vital signs archetype (Fig. \ref{fig:vital_signs_archetype}) can be encoded as shown in Eqs.~\ref{eqn:vital_signs_record} and \ref{eqn:vital_signs_section}.
The specialization of archetypes corresponds to Record subtyping, and the composition of archetypes corresponds to Record nesting.
\begin{figure*}[!htbp]
\begin{minipage}{1.0\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
!textit(definition)
!textsf(SECTION)[at0000] matches { -- Vital signs
!textit(items) cardinality matches {2} matches {
allow_archetype !textsf(OBSERVATION)[at0001] matches {
include
archetype_id/value matches {"openEHR-EHR-OBSERVATION.blood_pressure.v1"}
}
allow_archetype !textsf(OBSERVATION)[at0002] matches {
include
archetype_id/value matches {"openEHR-EHR-OBSERVATION.body_temperature.v1"}
}
}
}
\end{Verbatim}
\end{minipage}
\caption{Excerpt of a vital signs archetype} \label{fig:vital_signs_archetype}
{\footnotesize
This is an example of the definition section in a vital signs archetype.
Note that we modified the certified archetype in the Clinical Knowledge Manager (CKM) repository to match the explanation herein.
}
\end{figure*}
\begin{figure*}[!htbp]
\begin{alignat}{3}
\mathsf{VitalSigns} & \equiv \mathsf{Record}\{&&definition[at0000] \text{:} \mathsf{SECTION^{VitalSigns}} \} && \label{eqn:vital_signs_record}\\
\mathsf{SECTION^{VitalSigns}} & \equiv \mathsf{Record}\{&&items[at0001] \text{:} \mathsf{BloodPressure},\nonumber &&\label{eqn:vital_signs_section}\\
& &&items[at0002] \text{:} \mathsf{BloodTemperature}\}\} &&
\end{alignat}
\end{figure*}
EHR data are instances of Archetypes, \footnote{Note that this is not an "archetype instance", which is defined as a metadata class instance of an archetype model in the ISO 13606 specification \cite[p.2]{iso08:13606_healt_elect_part1}.} i.e., instances of the corresponding {\sf Record} type, because each EHR datum is created according to the corresponding archetype.
For example, Eq.~\ref{eqn:vital_signs_instance} represents an EHR datum for the vital sign archetype of Eq.~\ref{eqn:vital_signs_record}.
Note that instance names are written in lower case, e.g., $record$, and the details of $\mathsf{BloodPressure}$ and $\mathsf{BloodTemperature}$ are omitted for clarity in this equation.
\begin{figure*}[!htbp]
\begin{alignat}{3}
vital\_signs \equiv record\{definition[at0000] \text{:} record\{&items[at0001] \text{:} record\{systolic\text{:} 120, diastolic \text{:} 80\},\nonumber\\
&items[at0002] \text{:} record\{temperature\text{:} 37.5, unit \text{:} "C"\}\}\} \label{eqn:vital_signs_instance}
\end{alignat}
\end{figure*}
Table \ref{tab:correspondence} summarizes our assignment of ISO 13606 to type theory.
We believe that these assignments are intuitively plausible, as explained above.
However, this does not exclude other possibilities of assignments, which might lead to different results from those of the present study.
\begin{table}[!htbp]
\centering
\caption{Correspondence between type theory and ISO 13606}\label{tab:correspondence}
\begin{tabular}{ccc}
\toprule%[1.5pt]
ISO 13606 & type theory \\
\midrule
primitive types in RM & basic types \\
data structures in RM & structured types \\
archetype & record type \\
EHR data & instances of record type\\
\bottomrule%[1.5pt]
\end{tabular}
\end{table}
\subsubsection{Introducing types}
We next present the minimum set of typed lambda calculus that is sufficient to clarify the basis of archetype semantics (in particular variance and immutability of archetypes) by means of formalization in support of the aforementioned two assertions.
We start from fundamental types, such as the function or basic type, and progressively extend the system with more complex types, such as the record or variant type (Table \ref{tab:types_notations}).
For a more comprehensive explanation of type theory, see for instance, \cite{pierce02:_types_progr_languag,cardelli04:_type_system,barendregt13:_lambd_calcul_types}.
\begin{table}[!htbp]
\centering
\caption{List of major types and corresponding notation used in the present study}\label{tab:types_notations}
\vspace{6pt}
\begin{tabular}{lcrr}
\toprule %[1.5pt]
type & & notation \\
\midrule
function type & & \(\tau \rightarrow \sigma \)\\
basic types & & \(\mathsf{String}, \mathsf{Boolean}\), etc.\\
& \\
& record type & \(\mathsf{Record}\)\\
structured types & variant type & \(\mathsf{Variant}\)\\
& list type & \([\tau]\)\\
\bottomrule%[1.5pt]
\end{tabular}
\end{table}
First, a formal assertion relating entities such as terms or types is referred to as a {\bf judgment} (Eq.~\ref{eqn:judgment}).
The judgment $\Gamma ~ \vdash M:\tau$ should be read as "a term M has a type of $\tau$ under an environment $\Gamma$", and the judgment $\Gamma ~ \vdash \tau$ means that $\tau$ is a well-formed type under the environment $\Gamma$. In either case, $\Gamma$ is a set of assumptions about the types of free variables in $M$ and is referred to as the {\bf type environment}.
Note that, in the present study, we use M and N as terms, and $\tau$ and $\sigma$ for meta-variables over type.
\begin{alignat}{3}
& {\Gamma~ \vdash \Omega} \qquad \text{(judgment)} \label{eqn:judgment}
\end{alignat}
In typed lambda calculus, types are introduced into the system by adding {\bf typing rules}, such as those given by Eq.~\ref{eqn:function_type_rule}.
These typing rules assert the validity of certain judgments based on other valid judgments.
The judgments above the horizontal line are referred to as the premises of the rule, and those below the horizontal line are referred to as the conclusion.
The most fundamental type is the Function Type ($\tau \rightarrow \sigma$), which satisfies the typing rule given by Eq.~\ref{eqn:function_type_rule}.
The notation $(M~N)$ in Eq.~\ref{eqn:application} denotes a function application in which a function $M$ is applied to the argument $N$.
A function is represented by a lambda abstraction, such as $\lambda x\text{:}\tau.M$ (Eqs.~\ref{eqn:lambda-abstraction_term} and \ref{eqn:lambda-abstraction_typing_rule}).
In this lambda abstraction, $x$ represents a parameter of the function, and $M$ represents the body of the function \cite[p.14]{cardelli04:_type_system}.
When we apply the term $M$ of function type $\tau \rightarrow \sigma$ to the value of type $\tau$, we obtain the term of type $\sigma$ (Eq.~\ref{eqn:application}).
This function type appears in database queries (such as Eq.~\ref{eqn:user_level_query}), because a query is a function that accepts a database and returns a result set of the database.
\begin{figure*}[!htbp]
\begin{align}
\text{Typing rules:} \qquad \nonumber\\
& \frac{\Gamma~ \vdash \tau \qquad \Gamma~ \vdash \sigma}
{\Gamma~ \vdash \tau \rightarrow \sigma} & \label{eqn:function_type_rule} \text{(function type)}\\[6pt]
& \frac{\Gamma \vdash M\text{:}\tau \rightarrow \sigma \qquad \Gamma \vdash N\text{:}\tau}
{\Gamma ~ \vdash (M~N)\text{:}\sigma} & \qquad \text{(application)} \label{eqn:application}
\end{align}
\caption{Function type and its application}
\end{figure*}
\begin{figure*}[!htbp]
\begin{alignat}{3}
\text{Terms:} \qquad & \nonumber\\
M~ & {::= } ~ \dots \\
& | \quad {\lambda x\text{:}\tau.M} & \qquad (\lambda~abstraction) \label{eqn:lambda-abstraction_term}\\
\text{Typing rule:} \qquad & & \nonumber \\
& \frac{\Gamma,x\text{:}\tau~ \vdash M\text{:}\sigma }
{\Gamma ~ \vdash \lambda x\text{:}\tau.M\text{:}\tau \rightarrow \sigma} & \label{eqn:lambda-abstraction_typing_rule}
\end{alignat}
\caption{$\lambda$ abstraction}\label{eqn:lambda-abstraction}
\end{figure*}
Because EHR data are aggregates of various types of data, we need to introduce such data types.
For example, blood pressure data are constructed as aggregates with the numeric type for the pressure value and a string type for the unit of measurement.
We classify these data types into two categories: basic and structured.
Basic types, such as Boolean or Integer, are primitive data types in that they lack any internal structure.
When these primitive types are built in BasicType, they are available in our system as Eq.~\ref{eqn:basic_type_rule}.
$\Gamma~ \vdash \diamond$ means that $\Gamma$ is well-typed, which means that the type environment $\Gamma$ has nothing but valid types.
\begin{figure}[!htbp]
\begin{align}
\frac{\Gamma~ \vdash \diamond \qquad \tau \in \mathsf{BasicType}}
{\Gamma~ \vdash \tau} \label{eqn:basic_type_rule}
\end{align}
\end{figure}
ISO 13606 provides several kinds of primitive types, such as {\sf String} and {\sf Boolean} \cite[p.42]{iso08:13606_healt_elect_part1}.
For example, the {\sf Boolean} type can be introduced into the proposed type system by Eq.~\ref{eqn:basic_type_rule}, because $\mathsf{Boolean} \in \mathsf{BasicType}$ holds in ISO 13606.
As one such primitive type, we introduce the {\sf{\bf Unit}} type (Eq.~\ref{eqn:type_unit}), which has no corresponding type in ISO 13606.
The Unit type has only one value called {\it{unit}} (Eq.~\ref{eqn:unit_typing_rule}), which is often used as a filler for uninteresting results \cite{cardelli04:_type_system,pierce02:_types_progr_languag}.
In the present paper, this type is used in a field of Variant type (Eq.~\ref{eqn:type_hepatitis}) and a return value of assignments (Eq.~{\ref{eqn:assignment}}).
\begin{figure*}[!htbp]
\begin{alignat}{3}
\text{Types:} \qquad \nonumber\\
\tau~ & {::= \dots }\nonumber\\
& | ~~ \mathsf{Unit} & \qquad \label{eqn:type_unit} \text{(type unit)}\\
\text{Typing rules:} \qquad \nonumber\\
& \frac{\Gamma~ \vdash \diamond }
{\Gamma ~ \vdash unit \text{:}\mathsf{Unit}} & \label{eqn:unit_typing_rule} \text{(value unit)}
\end{alignat}
\caption{Unit}\label{fig:unit_type}
\end{figure*}
EHR data are inherently structured in that they consist of large collections of data entities of several types.
Thus, we introduce such structured data types into our type system in order to capture the ISO 13660 data model.
These structured types are Record, Variant, and List types.
\begin{enumerate}
\item Record Type
A {\sf{\bf Record}} type is a cross-product of the values with a projection operation for extracting components by name \cite[p.18]{cardelli04:_type_system,pierce02:_types_progr_languag}.
As stated previously, most of the components in archetypes, such as Fig. \ref{fig:vital_signs_archetype}, are encoded as {\sf Record} types (Eqs.~\ref{eqn:vital_signs_record} and \ref{eqn:vital_signs_section}).
The definition of structured type has three sections. In the record type, the terms section defines syntactic forms of a record and its projection (Eqs.~\ref{eqn:record_construction} and \ref{eqn:record_projection}). The types section defines the $\mathsf{Record}$ type itself (Eq.~\ref{eqn:record_type}). And, the typing rules section defines rules for the type of record construction and its projection (Eqs.~\ref{eqn:record_constructiontyping_rule} and \ref{eqn:record_projection_typing_rule}). Note that $i \in 1..n$ is shorthand for $\{ i~|~ i \in \mathbb{N} \wedge 1 \leq i \wedge i \leq n )\}$, which indicates that the variable $i$ ranges over natural numbers $\mathbb{N}$ from 1 to $n$.
Also note that a record is basically a tuple, the components of which are not indexed by numbers but rather labeled by names. Therefore, the components in {\sf{Record}} type are unordered.
\begin{figure*}[!htbp]
\begin{alignat}{4}
\text{Terms:} \qquad \nonumber\\
M~ & {::= \dots }\nonumber\\
& | ~~ record\{l_i = M\} \qquad where~i \in 1..n & (\text{record construction}) \label{eqn:record_construction}\\
& | ~~M.l_i & (\text{record projection}) \label{eqn:record_projection}\\
\text{Types:} \qquad \nonumber\\
\tau~ & {::= \dots}\nonumber\\
& |~~ \mathsf{Record}\{l_i \text{:} \tau_i \} \qquad where~i \in 1..n \label{eqn:record_type}\\
\text{Typing rules:} \qquad \nonumber\\
& \frac{\Gamma \vdash M_i\text{:}\tau_i}
{\Gamma ~ \vdash record\{l_i\text{:} M_i\}: \mathsf{Record}\{l_i \text{:} \tau_i\}} \qquad where~i \in 1..n \label{eqn:record_constructiontyping_rule} \\[6pt]
& \frac{\Gamma \vdash M \text{:} \mathsf{Record}\{l_i \text{:} \tau_i\}}
{\Gamma ~ \vdash M.l_i \text{:} \tau_i} \qquad where~i \in 1..n \label{eqn:record_projection_typing_rule}
\end{alignat}
\caption{Records}\label{fig:record_type}
\end{figure*}
\item Variant Type
While {\sf Record} type requires all of the fields to be set by values, there is a case in which only one of the fields must be filled in.
This is called the ${\sf{\bf Variant}}$ Type, which is a named disjoint union of types \cite[p.136]{pierce02:_types_progr_languag}\cite[p.19]{cardelli04:_type_system}.
Figure \ref{fig:example_variant_type} is an example of such a case, and this fragment of the archetype should be encoded as a variant type (Eqs.~\ref{eqn:value_hepatitis} and \ref{eqn:type_hepatitis}).
\begin{figure*}[!htbp]
\begin{alignat}{3}
\text{Terms:} \qquad \nonumber\\
M~& {::= \dots } \nonumber\\
& | ~~ variant\langle l_i = M_i \rangle \qquad where~i \in 1..n & (\text{variant construction}) \label{eqn:variant_construction}\\
% & | ~~ \text{case}~M~\text{of}~\langle l_i = x_i \rangle ~\Rightarrow~ M_i \qquad where~i \in 1..n & (\text{variant case match}) \label{eqn:variant_case_match}\\
\text{Types:} \qquad \nonumber\\
\tau~ & {::= \dots} \nonumber\\
& |~~ \mathsf{Variant}\langle l_i \text{:} \tau_i \rangle \qquad where~i \in 1..n \\
\text{Typing rules:} \qquad \nonumber\\
& \frac{\Gamma \vdash M_j\text{:}\tau_j \qquad where~i \in 1..n}
{\Gamma ~ \vdash variant\langle l_j\text{:} M_j\rangle: \mathsf{Variant}\langle l_i \text{:} \tau_i\rangle^{i \in 1..n} } \label{eqn:variant_constructiontyping_rule} \\[6pt]
% & \frac{\Gamma \vdash M \text{:} \mathsf{Variant}\langle l_i\text{:} \tau_i\rangle \qquad \Gamma,x_i\text{:}\tau_i ~\vdash M_i \text{:} \tau}
% {\Gamma ~ \vdash \text{case}~M~\text{of}~\langle l_i = x_i \rangle ~\Rightarrow~ M_i\text{:}\tau} \qquad where~i \in 1..n
\end{alignat}
\caption{Variants}\label{fig:variant_type}
\end{figure*}
\begin{figure}[!htbp]
\begin{minipage}{1.0\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
!textit(value) matches {
!textsf(DV_CODED_TEXT) matches {
!textit(defining_code) matches {
[local::
at1000, -- Hepatitis A
at1001 -- Hepatitis B
]
}
}
}
\end{Verbatim}
\end{minipage}
\caption{Example of a Variant Type in DV\_CODED\_TEXT}\label{fig:example_variant_type}
{\footnotesize
This fragment defines a constraint of type DV\_CODED\_TEXT, the value of which should be either hepatitis A or B.
}
\end{figure}
\begin{figure*}[!htbp]
\begin{alignat}{3}
value & : \mathsf{DV\_CODED\_TEXT^{HepatitisVirus}} \label{eqn:value_hepatitis}\\
\mathsf{DV\_CODED\_TEXT^{HepatitisVirus}} & \equiv \mathsf{Record}\{defining\_code \text{:} \mathsf{Variant}\langle \text{at1000} \text{:} \mathsf{Unit},~\text{at1001} \text{:} \mathsf{Unit} \rangle\} \label{eqn:type_hepatitis}
\end{alignat}
\end{figure*}
\item List Type
EHR data sometimes contain a series of events, such as a sequence of laboratory data in the context of hemodialysis. These kinds of collective values are represented by the {\sf{\bf List}} type \cite{cardelli04:_type_system,pierce02:_types_progr_languag,geuvers09:_introd_type_theor}.
In the present paper, the {\sf List} type is denoted by $[\tau]$, describing finite-length sequences the element of which are taken from $\tau$.
Figure \ref{fig:example_list_type} is an example of such a case, and the {\it{parts}} attribute in this fragment of the archetype should be encoded as a {\sf List} type (Eq.~\ref{eqn:type_parts}).
\begin{figure*}[!htbp]
\begin{minipage}{1.0\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
!textit(definition)
!textsf(CLUSTER) [at0000] matches {
!textit(parts) cardinality matches {0..*; ordered} matches {
allow_archetype !textsf(ENTRY) matches {
include
archetype_id/value matches {
"CEN-EN13606-ENTRY.laboratory.v1"
}
}
}
}
\end{Verbatim}
\end{minipage}
\caption{Example of a List Type in CLUSTER}\label{fig:example_list_type}
\end{figure*}
\begin{figure*}[!htbp]
\begin{alignat}{3}
definition & \text{:} \mathsf{CLUSTER^{Haemodialysis}}\\
\mathsf{CLUSTER^{Haemodialysis}} & \equiv \mathsf{Record}\{parts \text{:} [\mathsf{LaboratoryEntry}]\} \label{eqn:type_parts}
\end{alignat}
\end{figure*}
\end{enumerate}
\subsubsection{Subtyping}
EHR data are organized hierarchically.
In Fig. \ref{fig:excerpt_reference_model}, for example, {\sf ELEMENT} and {\sf CLUSTER} are subordinate to {\sf ITEM} in their hierarchy, because each subordinate class (i.e., {\sf ELEMENT} and {\sf CLUSTER}) inherits all of the members of its parent class {\sf ITEM}.
This hierarchical relation, which is implemented as inheritance in object-oriented languages, is formalized as {\bf subtyping} in type theory \cite[p.85]{craig07:_objec_orien_progr_languag}.
In this section, we introduce general rules for subtyping and specific rules for each type.
When we define the specific subtyping rules so as to be compatible with the general rules, we are able to check the validity of a given subtyping based on a deductive system of the type theory.
\begin{figure}[!htbp]
\begin{center}
\includegraphics[width=6cm]{excerpt_reference_model.eps}
\end{center}
\caption{Excerpt of the ISO 13606 Reference Model}\label{fig:excerpt_reference_model}
{\footnotesize
This UML diagram is an excerpt from the ISO 13606 Reference Model \cite[p.10]{iso08:13606_healt_elect_part1} and defines the hierarchy of {\sf ITEM} and its subordinates {\sf ELEMENT} and {\sf CLUSTER}.
}
\end{figure}
The central theorem of subtyping is the {\bf subsumption principle} (Eq.~\ref{eqn:subsumption_rule}), which says that if type $\tau'$ is a subtype of $\tau$ (i.e., $\tau' <: \tau$), then an instance of type $\tau'$ can be used in any context in which an instance of type $\tau$ can appear.
Note that the subtyping operator '$<:$' is a reflexive transitive closure (Eqs.~\ref{eqn:subtyping_reflexive} and \ref{eqn:subtyping_transitive}) similar to the numeric comparison operator '$\leq$', so that $\tau' <: \tau \text{ and}~ \tau <: \tau' $ means that $\tau \text{ is equal to } \tau'$ (cf., $a \leq b \text{ and}~ b \leq a $ means that $a = b$).
\begin{figure*}[!htbp]
\begin{alignat}{2}
\text{Typing rules:} \qquad \nonumber\\
& \frac{\Gamma~ \vdash M \text{:} \tau' \qquad \Gamma~ \vdash \tau' <: \tau}
{\Gamma~ \vdash M \text{:} \tau} & \text{(subsumption principle)}\label{eqn:subsumption_rule}\\[6pt]
& \frac{\Gamma~ \vdash \tau}
{\Gamma~ \vdash \tau <: \tau} & \label{eqn:subtyping_reflexive} \text{(reflexive)}\\[6pt]
& \frac{\Gamma~ \vdash \tau'' <: \tau' \qquad \Gamma~ \vdash \tau' <: \tau}
{\Gamma~ \vdash \tau'' <: \tau} & \label{eqn:subtyping_transitive} \text{(transitive)}
\end{alignat}
\end{figure*}
Subtyping rules should be defined for each type, and every subtyping rule should be compatible with the subsumption principle.
We next present the subtyping rules for structured types.
In the record type, one record type is a breadth subtype of another if the first has at least all of the fields of the second (Eq.~\ref{eqn:record_width_subtyping}), and the other record type is a depth subtype of another if they have exactly the same fields, but the types of the corresponding fields are subtypes (Eq.~\ref{eqn:record_depth_subtyping}).
In the variant type, the depth subtyping rule for the variant type is identical to that of the record type (cf., Eqs.~\ref{eqn:variant_depth_subtyping} and \ref{eqn:record_depth_subtyping}), but the breadth subtyping rules for the variant type is different from that for the record type: a subtype of the variant type cannot have more fields than its supertype (Eq.~\ref{eqn:variant_breadth_subtyping}).
The variance of the list type is defined as covariant, as in the case of the {\sf Record} type. Therefore, in the case of a list having elements of type $\tau'$, and $\tau' <: \tau$, we can regard the list as having elements of type $\tau$ (Eq.~{\ref{eqn:list_subtyping_rule})} \cite[p.197]{pierce02:_types_progr_languag}.
Since we abstracted archetypes as record types, archetype specialization can be modeled as record subtyping. We can add new items or modify the existing items in the specialized archetype as long as the extensions are compatible with Eqs.~\ref{eqn:record_width_subtyping} and \ref{eqn:record_depth_subtyping}.
\begin{figure*}[!htbp]
\begin{alignat}{3}
\text{Record subtyping rules:} \qquad \nonumber\\
& \frac{\Gamma \vdash \tau_i \qquad where~i \in 1..n+m}
{\Gamma ~ \vdash \mathsf{Record}\{l_i \text{:} \tau_i\}_{i \in 1..n+m} <: \mathsf{Record}\{l_i \text{:} \tau_i\}_{i \in 1..n}} & \label{eqn:record_width_subtyping}\\[6pt]
& \frac{\Gamma \vdash \tau'_i <: \tau_i}
{\Gamma ~ \vdash \mathsf{Record}\{l_i\text{:}\tau'_i \} <: \mathsf{Record}\{l_i\text{:}\tau_i\}} \qquad where~i \in 1..n & \label{eqn:record_depth_subtyping}\\
\text{Variant subtyping rules:} \qquad \nonumber\\
& \frac{\Gamma \vdash \tau_i \qquad where~i \in 1..n+m}
{\Gamma ~ \vdash \mathsf{Variant}\langle l_i\text{:} \tau_i\rangle_{i \in 1..n} <: \mathsf{Variant}\langle l_i\text{:}\tau_i\rangle_{i \in 1..n+m}} & \label{eqn:variant_breadth_subtyping}\\[6pt]
& \frac{\Gamma \vdash \tau'_i <: \tau_i}
{\Gamma ~ \vdash \mathsf{Variant}\langle l_i\text{:}\tau'_i \rangle <: \mathsf{Variant}\langle l_i\text{:}\tau_i\rangle} \qquad where~i \in 1..n & \label{eqn:variant_depth_subtyping}\\
\text{List subtyping rule:} \qquad \nonumber\\
& \frac{\Gamma \vdash \tau' <: \tau}
{\Gamma ~ \vdash [\tau'] <: [\tau]} & \label{eqn:list_subtyping_rule}
\end{alignat}
\caption{Subtyping rules for Record, Variant, and List}\label{fig:subtyping_rules_record_variant_list}
\end{figure*}
Structured types, even basic types, can be subtyped by introducing an ad hoc subtyping rule \cite[p.28]{cardelli04:_type_system}.
When we define the ad hoc rules, their actual characteristics of the values should be compatible with the subsumption principle. Otherwise, the system is consistent in theory, but useless in reality.
For example, since an integer value can be used in the context in which a real value can appear (e.g., 1 as 1.0), a subtyping rule $\mathsf{Integer} <: \mathsf{Real}$ can be introduced.
This can also be applied to the structured types in the RM, as long as they are compatible with the ISO 13606 specification. According to Fig. \ref{fig:excerpt_reference_model}, for example, we can define subtyping between {\sf ELEMENT} and {\sf ITEM} as shown in Eq.~\ref{eqn:element_subtyping} and {\sf CLUSTER} and {\sf ITEM} as shown in Eq.~\ref{eqn:cluster_subtyping}.
\begin{alignat}{3}
\frac{\Gamma~ \vdash \diamond}
{\Gamma~ \vdash \mathsf{ELEMENT} <: \mathsf{ITEM}} \qquad \text{(ad-hoc)} \label{eqn:element_subtyping} \\[6pt]
\frac{\Gamma~ \vdash \diamond}
{\Gamma~ \vdash \mathsf{CLUSTER} <: \mathsf{ITEM}} \qquad \text{(ad-hoc)} \label{eqn:cluster_subtyping}
\end{alignat}
\subsubsection{Type system and its deductive system}
Deductive proofs are necessary in order to confirm the validity of the assertions under a given system.
When we define a number of type rules based on previous disciplines, the collection of axioms and typing rules is referred as a {\bf type system}.
Once we have a type system, we can construct a deductive proof by combining the typing rules, and the proof is referred to as a {\bf derivation tree}.
Any derivation tree has leaves at the top and a root at the bottom, where each judgment is obtained from those immediately above it by applying some of the rules of the system.
Figure \ref{fig:example_derivation_tree} shows a simple example of a derivation tree, indicating that the function of type $\mathsf{String} \rightarrow \mathsf{Integer}$ yields an integer type when applied to a string value. Note that the number in brackets to the right of the horizontal line indicates the equation number of the rule that was used (in this case, {Eq.~\ref{eqn:application}}).
This deductive proof system is the most powerful feature of typed lambda calculus, because we can investigate the semantics of ISO 13606 with formal deductions.
%\begin{pf}
\begin{figure*}[!htbp]
\begin{prooftree}
\AxiomC{$\Gamma ~ \vdash \lambda x$:$\mathsf{String}.M$:$\mathsf{String} \rightarrow \mathsf{Integer}$}
\AxiomC{$\Gamma ~ \vdash "text"$:$\mathsf{String}$}
\RightLabel{[\ref{eqn:application}]}
\BinaryInfC{$\Gamma ~ \vdash ((\lambda x$:$\mathsf{String}.M)~"text") $:$ \mathsf{Integer}$}
\end{prooftree}
\caption{An example of a derivation tree}\label{fig:example_derivation_tree}
\end{figure*}
%\end{pf}
\subsection{Application of type theory to ISO 13606}
In the previous sections, we introduced basic components of type theory and the correspondence of type theory to ISO 13606.
Next, we need to encode the problems in terms of this formal language in order to support the assertions (Assertions 1 and 2).
\subsubsection{How to formalize queries \label{sec:howto_formalize_queries}}
{\bf Assertion 2}, which states that ``data created with any specialized archetype will always be matched by queries based on the parent archetype'', is related to a requirement as to which queries and specialization should be obeyed.
Before deriving the assertion from the proposed type system, we encoded this assertion in terms of typed lambda calculus as follows.
Database theories \cite{hillebrand96:_datab_query_languag_embed_typed_lambd_calcul,grust99:_compr_queries,grust99:_how_compr_queries_funct,poulovassilis96:_algeb} tell us that the database can be presented as list of data, and its queries can be encoded as lambda terms and {\tt foldr} (Eqs.~\ref{eqn:database_as_list} and \ref{eqn:query_by_foldr}).
Note that archetype is abbreviated as $\alpha$, N is a database containing a list of EHR data created by archetype $\alpha$, and $[~]$ is an empty list.
The database query has function type, because Eq.~\ref{eqn:query_by_foldr} indicates that $query~p$ takes a database that consists of a list of data (i.e., $[\alpha]$) and returns a result that also consists of a list of data (i.e., $[\alpha]$). The lambda term $p$ in Eq.~\ref{eqn:user_level_query} is a user-level query that is applied to each element of a collection, as {\tt{foldr}} calls itself recursively. The essential logic of query is embedded in term M. For instance, if we define $p \equiv \lambda x\text{:}\alpha.(\lambda xs\text{:}[\alpha].(\text{if}~ x.id == \text{"blood~pressure"} ~\text{then}~ x\text{:}xs ~\text{else}~ xs))$, then $query~p~N$ filters out the list of data for which the id field is "blood pressure".
\begin{figure*}[!htbp]
\begin{alignat}{3}
database \equiv & ~ N \text{:} [\alpha] \label{eqn:database_as_list}\\
query~p \equiv & ~ \mathtt{foldr} ~ p ~[~] \text{:} [\alpha] \rightarrow [\alpha] \label{eqn:query_by_foldr}\\
& \text{where} \quad p \equiv \lambda~x\text{:}\alpha.(\lambda~xs\text{:}[\alpha].M) \text{:} \alpha \rightarrow [\alpha] \rightarrow [\alpha] \label{eqn:user_level_query} \quad \text{and,}\\
& \qquad \text{{\tt foldr} is a higher-order combinator for recursion \cite{hutton99:tutorial_universality_expressiveness_fold,bird92:_introd_funct_progr}} \nonumber
\end{alignat}
\end{figure*}
Based on these preconditions, {\bf Assertion 2} is formally expressed in Fig. \ref{fig:formalized_second_assertion}. The next task is to derive the judgment $(query~p)~N' \text{:} [\alpha] $ from these premises to support {\bf Assertion 2}.
\begin{figure}[!htbp]
\begin{alignat}{3}
\text{provided that\quad} & \Gamma \vdash \alpha' <: \alpha \text{, and}~ \\
& \Gamma \vdash query~p\text{:}[\alpha] \rightarrow [\alpha]\text{, and}~ \\
& \Gamma \vdash N'\text{:}[\alpha'] \\
\text{then, derive\quad} & \Gamma \vdash (query~p)~N' \text{:}[\alpha] ~ &
\end{alignat}
\caption{Assertion 2 formalized}\label{fig:formalized_second_assertion}
\end{figure}
\subsubsection{How to introduce assignments for EHR data \label{sec:howto_introduce_assignments}}
Thus far, it has been impossible to update EHR data, because the record type, for example, has only the projection operation and not the assignment operation for updating (Eq.~\ref{eqn:record_projection}).
Most programming languages and database systems, however, use destructive updating methods. In other words, old values are replaced by new values.
Regarding this issue, an archetype is said to be mutable if a reference of the value can be obtained and updated by the assignment. Otherwise, it is said to be immutable, because the value cannot be mutated \cite[p.295]{harper13:_pract_found_progr_languag}.
We attempted to change archetypes from immutable to mutable by introducing update functionality, and investigated what would happen to the archetype semantics.
In order to represent mutable archetypes, an additional type called the {\sf{\bf Ref}} type must be introduced.
The value of the {\sf Ref} type is a reference that designates the mutable locations in memory \footnote{Interested readers can refer to Pierce \cite[p.159]{pierce02:_types_progr_languag} or Bruce \cite[p.77]{bruce02:_found_objec_orien_languag} for more concrete explanations.}, described as $\mathsf{Ref(\tau)}$ (Eq.~\ref{eqn:type_reference}).
If term M has type $\mathsf{Ref(\tau)}$, then M denotes a location that can hold a value of type $\tau$.
A datum can be updated by assignment operator ':=', the return type of which is {\sf Unit} type (Eqs.~\ref{eqn:type_unit} and \ref{eqn:assignment}).
\begin{figure*}[!htbp]
\begin{alignat}{3}
\text{Types:} \qquad \nonumber\\
\tau~ & {::= \dots }\\
& | ~~ \mathsf{Ref(\tau)} & \qquad \label{eqn:type_reference} \text{(type reference)}\\
\text{Typing rules:} \qquad \nonumber\\
& \frac{\Gamma~ \vdash M\text{:}\mathsf{Ref(\tau)} \qquad \Gamma~ \vdash N\text{:}\tau}
{\Gamma ~ \vdash M := N\text{:}\mathsf{Unit}} & \label{eqn:assignment} \qquad\qquad \text{(assignment)}\\
\text{Subtyping rules:} \qquad \nonumber\\
& \frac{\Gamma ~ \vdash \tau' <: \tau \qquad \Gamma ~ \vdash \tau <: \tau'}
{\Gamma ~ \vdash \mathsf{Ref(\tau')} <: \mathsf{Ref(\tau)}} \label{eqn:reference_subtyping}
\end{alignat}
\caption{Reference Type}\label{fig:reference_type}
\end{figure*}
Thus, the mutable archetype itself is encoded as $\mathsf{Archetype}\{l_i \text{:} \mathsf{Ref}(\tau_i)\}$ \cite[p.19]{cardelli04:_type_system}, and the typing rule of the archetype is given by Eq.~\ref{eqn:mutable_record_field_reference}, which indicates that the subtyping relation of mutable Archetype depends on the subtyping of the {\sf Ref} type.
Note that the {\sf Ref} type must be taken to be invariant in order to preserve type safety (Eq.~{\ref{eqn:reference_subtyping}}). This restrictive subtyping rule is necessary because a value of type {\sf Ref} can be used in a context of both reading and writing \cite[p.198]{pierce02:_types_progr_languag}\cite[p.29]{cardelli04:_type_system}.
In the Results section , we describe a critical change in the archetype semantics that is brought about by this new encoding of the archetype.
\begin{alignat}{3}
& \frac{\Gamma \vdash M_i \text{:} \mathsf{Ref}(\tau_i) }
{\Gamma ~ \vdash archetype\{l_i\text{:} M_i\} \text{:} \mathsf{Archetype}\{l_i \text{:} \mathsf{Ref}(\tau_i) \} } ~~ (i \in 1..n) \label{eqn:mutable_record_field_reference}
\end{alignat}
\section{Results}
Based on the basic semantics of ISO 13606 archetypes , we derived two assertions (Assertions 1 and 2) in this section.
Moreover, we investigated the consequences of introducing mutable archetypes and clarified the effects of this introduction on the archetype semantics.
\subsection{Archetypes should follow the subsumption principle}
We derived {\bf Assertion 1}, which states that any data created via the use of a specialized archetype shall be conformant both to the specialized archetype and its parent archetype as follows.
We divided this assertion into two cases: 1) any data created via an archetype shall be conformant to itself, and 2) any data created via the specialized archetype shall be conformant to its parent archetype.
Assume an archetype is abbreviated as $\alpha$ and a specialized archetype is abbreviated as $\alpha'$, then we get $\alpha' <: \alpha$.
Hence, the first case is straightforward because we have a trivial equality of $M \text{:} \alpha = M \text{:} \alpha$.
The second case is also straightforward. If archetypes are encoded as Record types and follow the subsumption principle, then the second case states the subsumption principle itself (cf., Eq.~\ref{eqn:subsumption_rule}).
\subsection{Archetypes should be covariant}
In Section \ref{sec:variances_archetype}, we encountered three alternatives of variance when we combined archetypes by both specialization and composition.
By defining the archetypes as $\mathsf{Archetype}\{l_i \text{:} \tau_i \}$ and combining subtyping rules for record type (Eqs.~\ref{eqn:record_width_subtyping} and \ref{eqn:record_depth_subtyping}), we obtain the subtyping rule for archetypes as Eq.~\ref{eqn:archetype_subtyping}, which reveals that archetypes are covariant in their components.
\begin{alignat}{3}
& \frac{\Gamma \vdash \tau'_i <: \tau_i \qquad where~i \in 1..n+m}
{\Gamma ~ \vdash \mathsf{Archetype}\{l_i\text{:} \tau'_i \}_{i \in 1..n+m} <: \mathsf{Archetype}\{l_i\text{:} \tau_i\}_{i \in 1..n}} & \label{eqn:archetype_subtyping}
\end{alignat}
Once the variance of archetypes is fixed as covariant, this affects not only the underlying semantics but also the syntax of the archetype slot. Suppose that an archetype composes another archetype with a regular expression $\text{/openEHR-EHR-CLUSTER.device.*/}$ in the archetype slot and that its specialized archetype composes the other archetype with \verb|/*/|. In this case, any data, even if the data are a supertype of the CLUSTER-device, can be set to the field, which violates the covariant requirement. Therefore, we claim that every composed archetype should be uniquely specified in the archetype slot by an exact archetype identifier, such as that shown in Fig. \ref{fig:archetype_slot_in_string}. In addition, we do not have to explicitly exclude improper archetypes, because they are implicitly excluded if they are not subtypes of the specified archetype.
\begin{figure*}[!htbp]
\begin{minipage}{1.0\linewidth}
\begin{Verbatim}[frame=single,fontsize=\small, commandchars=!\(\)]
!textsf(ITEM_TREE)[at0011] matches {
!textit(items) matches {
allow_archetype matches {
include
archetype_id/value matches {"openEHR-EHR-CLUSTER.device.v1"}
}
}
}
\end{Verbatim}
\end{minipage}
\caption{An example of a uniquely identified archetype slot}\label{fig:archetype_slot_in_string}
{\footnotesize
In this fragment of ADL, the archetype 'openEHR-EHR-CLUSTER.device.v1' is identified as a string in the archetype slot, and so is uniquely specified.
}
\end{figure*}
\subsection{Queries should be reusable}
Assertion 2 insists that queries should be reusable.
We define a number of archetypes at a certain point in time, and the data are stored in the EHR system based on the defined archetypes.
As time passes, some of the archetypes are specialized to create more detailed archetypes, as in Fig. {\ref{fig:cohort_study}}. New EHR data, which are conformant to the specialized archetypes, will accumulate in the system.
Then, it is natural to want to run the queries defined on the first generation archetypes on the EHR data that are conformant to their specialized archetypes.
This is what Assertion 2 demands.
In Section \ref{sec:howto_formalize_queries}, we formalized Assertion 2 in terms of typed lambda calculus, as shown in Fig. \ref{fig:formalized_second_assertion}.
The formalized statement of the assertion was used to derive $(query~p)~N' \text{:}[\alpha]$ from the assumptions $\alpha' <: \alpha, query~p\text{:}[\alpha] \rightarrow [\alpha]$, and $N'\text{:}[\alpha']$.
Note that $\alpha$ is a parent archetype, $\alpha'$ is the specialized archetype of $\alpha$, and $N' \text{:} [\alpha']$ refers to ``a list of data created by the specialized archetype $\alpha'$''.
This is case A in Fig. \ref{fig:archetype_queries}, because we are applying a query for a parent archetype $\alpha$ to the data created from its specialized archetype $\alpha'$.
Given these conditions, we constructed the following derivation tree to support Assertion 2 (Fig. \ref{derivation_second_assertion}).
\begin{figure*}[!htbp]
\begin{center}
\includegraphics[width=10cm,clip]{archetype_based_queries.eps}
\end{center}
\caption{Two cases of query applications}\label{fig:archetype_queries}
{\footnotesize
This figure describes the relationships between archetypes, EHR data, and their queries. Four arrows are drawn from queries to EHR data. The two solid arrows indicate ordinary queries, the argument type of which is matched to the type of the EHR data (e.g., $M\text{:}[\alpha] \rightarrow [\alpha]$ and $N\text{:}[\alpha]$). The two dotted arrows indicate the cases for which we investigated the validity in the text.
}
\end{figure*}
%\begin{pf}
\begin{figure*}[!htbp]
\begin{prooftree}
\AxiomC{$\Gamma~ \vdash \alpha' <: \alpha$}
\RightLabel{\quad [\ref{eqn:list_subtyping_rule}]}
\UnaryInfC{$\Gamma~ \vdash [\alpha'] <: [\alpha]$}
\AxiomC{$\Gamma~ \vdash N'\text{:} [\alpha']$}
\RightLabel{\quad [\ref{eqn:subsumption_rule}]}
\BinaryInfC{$\Gamma~ \vdash N'\text{:} [\alpha]$}
\AxiomC{$\Gamma~ \vdash query~p \text{:} [\alpha] \rightarrow [\alpha]$}
\RightLabel{\quad [\ref{eqn:application}]}
\BinaryInfC{$\Gamma~ \vdash (query~p)~N'\text{:}[\alpha]$}
\end{prooftree}\caption{Derivation of Assertion 2}\label{derivation_second_assertion}
\end{figure*}
%\end{pf}
The significance of this derivation is not so clear. Therefore, we presented the opposite case of applying a query for the specialized archetype $\alpha'$ to the data from the parent archetype $\alpha$. This assumption corresponds to case B in Fig. \ref{fig:archetype_queries}.
In this case, the query is encoded as $M' \text{:} [\alpha'] \rightarrow [\alpha']$, and the target database is encoded as $N\text{:}[\alpha]$. Then, it is obvious that $(M'~N)$ causes a type error, because $[\alpha']$ does not match $[\alpha]$ (cf., Eq.~\ref{eqn:application}). This indicates that the queries based on the parent archetype are applicable to the data created by its specialized archetype whereas queries based on a specialized archetype are not applicable to data created with the parent archetypes.
\subsection{Archetypes should be invariant if they are mutable schema}
Section \ref{sec:howto_introduce_assignments} revealed that if we want to update EHR data, we must encode Archetype as $\mathsf{Archetype}\{l_i\text{:} \mathsf{Ref}(\tau^i) \}$ as in Eq.~\ref{eqn:mutable_record_field_reference}. Therefore, we need to determine how to derive the judgment $\mathsf{Archetype}\{l\text{:} \mathsf{Ref(\tau')} \} <: \mathsf{Archetype}\{l\text{:} \mathsf{Ref(\tau)}\}$ based on the modified type system.
Since subtyping of {\sf Ref} types is defined in Eq.~\ref{eqn:reference_subtyping}, we derived the tree shown in Fig. \ref{fig:derivation_mutable_archetype} by combining the mutable Archetype judgment and the {\sf Ref} subtyping rules.
%\begin{pf}
\begin{figure}[!htbp]
\begin{prooftree}
\AxiomC{$\tau' <: \tau$}
\AxiomC{$\tau <: \tau'$}
\RightLabel{\quad [\ref{eqn:reference_subtyping}]}
\BinaryInfC{$\mathsf{Ref(\tau')} <: \mathsf{Ref(\tau)}$}
\RightLabel{\quad [\ref{eqn:archetype_subtyping}]}
\UnaryInfC{$\mathsf{Archetype}\{l\text{:} \mathsf{Ref(\tau')} \} <: \mathsf{Archetype}\{l\text{:} \mathsf{Ref(\tau)}\}$}
\end{prooftree}\caption{Derivation of the mutable archetype}\label{fig:derivation_mutable_archetype}
\end{figure}
%\end{pf}
Figure \ref{fig:derivation_mutable_archetype} shows that subtyping of mutable archetypes depends on the type of $\tau$ and $\tau'$. However, $\tau' <: \tau \text{ and}~ \tau <: \tau' $ means that $\tau \text{ is equal to } \tau'$, because subtyping operator $<:$ is both transitive and reflexive (Eqs.~\ref{eqn:subtyping_reflexive} and \ref{eqn:subtyping_transitive}). Therefore, the derivation of Fig. \ref{fig:derivation_mutable_archetype} reveals that archetypes must be invariant if they are mutable archetypes.
Since this consequence is less intuitive, it is demonstrated through a counterexample. Based on Eq.~\ref{eqn:hypothesis}, let us assume that mutable archetypes are covariant. Then, we identify an inconsistency from this false hypothesis.
\begin{figure*}[!htbp]