-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdiff.tex
6284 lines (5109 loc) · 511 KB
/
diff.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%DIF 1a1
%DIF LATEXDIFF DIFFERENCE FILE
%DIF DEL thesisSubmitted.tex Sat Aug 11 12:24:53 2018
%DIF ADD thesis.tex Tue Aug 14 09:53:08 2018
% !TeX spellcheck = en_US %DIF >
%DIF -------
% Documentation:
%
% The UoK class file extends the standard report style to follow the Registry
% guidelines for laying out a thesis. It sets the margins, interline spacing,
%DIF 5c6
%DIF < % the page, figure and table numbering style, and disallows page breaks at
%DIF -------
% the page, Figure and table numbering style, and disallows page breaks at %DIF >
%DIF -------
% hyphens. The class file consists of setting one and an half line spacing text
% with a 4cm left margin, at least a 2.5cm right margin, approximately 2cm top
% and bottom margin, on A4 paper.
%
% The class the following options, in addition to those of the standard report
% class.
% mini - Toggles the thesis in to mini-thesis mode. This adds "mini" to the
% title and appends a nocite(*) at the end for an automatic output of
% your complete bibliography.
% draftmark - Puts a DRAFT' watermark on every page of the document along
% with the draft statement on the title page. Additionaly, it
% is used as a switch for the UoKExtentions package.
% draft - Puts the entire document into draft mode. Applies all the effect
% of draftmark above, but also propergates to other packages used.
% copyright - Adds a copyright page between the title page and the preface.
%DIF 21c22
%DIF < % nofig - Disables output of the list of figures in the preface.
%DIF -------
% nofig - Disables output of the list of Figures in the preface. %DIF >
%DIF -------
% notab - Disables output of the list of tables in the preface.
% All options passed to UoKthesis will be passed along to included packages:
% natbib, draftwatermark, setspace, hyperref, lmodern
%
% The cover page and optional copyright page are implicitly added before the
% start of the preface section. Use the following commands to populate the
% cover page/copyright page information:
% \title{thesis title}
% \author{author's name}
% \degree{Master of Science, Doctor of Philosophy, etc.}
% \subject{author's department}
% - Computer Science if omitted
% \submitdate{month year in which submitted}
% - dated by LaTeX if omitted
% \copyrightyear{year degree conferred (next year if submitted in Dec.)}
% - assumes current year (or next year, in December) if omitted
%
% The preface environment allows for the use of sections that precede the main
% document; such as Abstract and Acknowlegements. These sections should be
% defined using \section{Preface Section Title}. The contents page (and list of
%DIF 42c43
%DIF < % figures and tables if in use) will be automatically inserted at the end of the
%DIF -------
% Figures and tables if in use) will be automatically inserted at the end of the %DIF >
%DIF -------
% preface environment.
%
% The thesis style invokes the setspace package to set the commands:
% \doublespace
% \onehalfspace
% \singlespace
% for spacing. By default one and an half spacing is used which resembles the
% UKC Typewriter requirement. Singlespace can be used for letterpress
% appearance. If you want to use true double space, for some reason, place the
% \doublespace command where you want to start using double spacing. Just call
% the appropriate spacing command at where you want to use them.
%
%DIF 55c56
%DIF < % In the figure and table environments, single spacing is used. If you want to
%DIF -------
% In the Figure and table environments, single spacing is used. If you want to %DIF >
%DIF -------
% use any other size rather than one and an half spacing, then do:
% \renewcommand{\baselinestretch}{1.6} (or whatever you want instead of 1.6)
% This command won't take effect unless it comes before the \begin{document} or
% is triggered by a font change (after something like \small \normalsize).
%
% The example below shows the 12pt thesis style being used. This seems to give
% acceptable looking results, but it may be omitted to get 10pt. Alternatively,
% the 11pt option can be used.
%
% This version differs from old_ukcthesis.sty in the following ways:
% 1. Removed the doublespace package (now uses setspace).
% 2. Merged the phantom section for correct PDF links into the bibliography
% generating function.
% 3. Added thesis type options (mini, draft).
% 4. Kent Harvard is used for referencing and citation, this is supported by the
% natbib package.
% 5. PsFig macro removed.
% 6. Now comes as two files, UoKthesis.cls, which defines purely stylistic layout,
% and UoKextentions.sty, that provideds some additional functionality.
\documentclass[12pt]{UoKthesis}
%\renewcommand*\rmdefault{ptm}
%\renewcommand{\familydefault}{\rmdefault}
% Note: The UoKextentions package includes the xcolor package with the [usenames]
% options. If you need to add further options, these can be given to UoKextentions
% to be propogated through.
\usepackage{UoKextentions}
\usepackage{times}
%\usepackage{llncsdoc}
%\usepackage{verbatim}
\usepackage{url}
\usepackage{color}
\usepackage{adjustbox}
\usepackage{amsmath}
\usepackage{relsize}
\usepackage[final]{listings}
\usepackage[T1]{fontenc}
%\usepackage[math]{times}
\usepackage{mathptmx}
\usepackage{graphicx}
\usepackage{wrapfig}
\usepackage[scaled=.90]{helvet}
\usepackage{courier}
\newcommand{\td}[1]{{\bf {\tt{#1}}}}
\newcommand{\comment}[1]{\textcolor{red}{\td{{#1}}}}
\usepackage{textcomp}
\usepackage{csquotes}
\lstset{
frame=none,
xleftmargin=2pt,
stepnumber=1,
numbers=left,
numbersep=5pt,
numberstyle=\ttfamily\tiny\color[gray]{0.3},
belowcaptionskip=\bigskipamount,
captionpos=b,
escapeinside={*'}{'*},
language=haskell,
tabsize=2,
emphstyle={\bf},
commentstyle=\it,
stringstyle=\mdseries\ttfamily,
showspaces=false,
keywordstyle=\bfseries\ttfamily,
columns=flexible,
upquote=true,
showstringspaces=false,
basicstyle=\small\ttfamily,
breaklines=true,
morecomment=[l]\%,
}
% Kent Harvard Bibliography Style. WIP
\bibliographystyle{kentHarvard}
% Provides nice linking in PDFs
\usepackage{hyperref}
% Only needed if you want to produce an index. Example is shown at the bottom of this document.
\usepackage{makeidx}
% Useful packages
% \usepackage{epstopdf} % Converts EPS files to PDF using ghostscript
% \usepackage{fnbreak} % Warns you if you have split footnotes
% \usepackage{mathpazo} % Typeset mathematics in the Palatino family of text fonts
% \usepackage{paralist} % Enumerate and itemize within paragraphs
% \usepackage{amsmath} % AMS mathematical facilities
% \usepackage{rotating} % Rotating facilities for floats
%DIF 146a147
\usepackage{caption} %DIF >
%DIF -------
\usepackage{subcaption}
\setcounter{secnumdepth}{3} % add more section types
%%%%% macros
\def\fixme#1{\fbox{\textbf{\textsc{Fixme}}\quad#1}}
\def\fixpic#1{\fbox{\textbf{\textsc{Picture}}\quad#1}}
\def\defnx#1#2{\emph{#1}\index{#2}}
\def\defn#1{\defnx{#1}{#1}}
\def\floatpic#1#2{%
\begin{wrapfigure}{r}{\dimexpr #1 / 2 \relax}
\includegraphics[width=\dimexpr #1 / 2 \relax]{#2}
\end{wrapfigure}}
\def\inlinepic#1#2{%
\begin{center}
\includegraphics[width=\dimexpr #1 / 2 \relax]{#2}
\end{center}}
%%%%% augment hyphenation
\hyphenation{wide-spread}
%%%%% document start
%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF UNDERLINE PREAMBLE %DIF PREAMBLE
\RequirePackage[normalem]{ulem} %DIF PREAMBLE
\RequirePackage{color}\definecolor{RED}{rgb}{1,0,0}\definecolor{BLUE}{rgb}{0,0,1} %DIF PREAMBLE
\providecommand{\DIFaddtex}[1]{{\protect\color{blue}\uwave{#1}}} %DIF PREAMBLE
\providecommand{\DIFdeltex}[1]{{\protect\color{red}\sout{#1}}} %DIF PREAMBLE
%DIF SAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddbegin}{} %DIF PREAMBLE
\providecommand{\DIFaddend}{} %DIF PREAMBLE
\providecommand{\DIFdelbegin}{} %DIF PREAMBLE
\providecommand{\DIFdelend}{} %DIF PREAMBLE
%DIF FLOATSAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddFL}[1]{\DIFadd{#1}} %DIF PREAMBLE
\providecommand{\DIFdelFL}[1]{\DIFdel{#1}} %DIF PREAMBLE
\providecommand{\DIFaddbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFaddendFL}{} %DIF PREAMBLE
\providecommand{\DIFdelbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFdelendFL}{} %DIF PREAMBLE
%DIF HYPERREF PREAMBLE %DIF PREAMBLE
\providecommand{\DIFadd}[1]{\texorpdfstring{\DIFaddtex{#1}}{#1}} %DIF PREAMBLE
\providecommand{\DIFdel}[1]{\texorpdfstring{\DIFdeltex{#1}}{}} %DIF PREAMBLE
\newcommand{\DIFscaledelfig}{0.5}
%DIF HIGHLIGHTGRAPHICS PREAMBLE %DIF PREAMBLE
\RequirePackage{settobox} %DIF PREAMBLE
\RequirePackage{letltxmacro} %DIF PREAMBLE
\newsavebox{\DIFdelgraphicsbox} %DIF PREAMBLE
\newlength{\DIFdelgraphicswidth} %DIF PREAMBLE
\newlength{\DIFdelgraphicsheight} %DIF PREAMBLE
% store original definition of \includegraphics %DIF PREAMBLE
\LetLtxMacro{\DIFOincludegraphics}{\includegraphics} %DIF PREAMBLE
\newcommand{\DIFaddincludegraphics}[2][]{{\color{blue}\fbox{\DIFOincludegraphics[#1]{#2}}}} %DIF PREAMBLE
\newcommand{\DIFdelincludegraphics}[2][]{% %DIF PREAMBLE
\sbox{\DIFdelgraphicsbox}{\DIFOincludegraphics[#1]{#2}}% %DIF PREAMBLE
\settoboxwidth{\DIFdelgraphicswidth}{\DIFdelgraphicsbox} %DIF PREAMBLE
\settoboxtotalheight{\DIFdelgraphicsheight}{\DIFdelgraphicsbox} %DIF PREAMBLE
\scalebox{\DIFscaledelfig}{% %DIF PREAMBLE
\parbox[b]{\DIFdelgraphicswidth}{\usebox{\DIFdelgraphicsbox}\\[-\baselineskip] \rule{\DIFdelgraphicswidth}{0em}}\llap{\resizebox{\DIFdelgraphicswidth}{\DIFdelgraphicsheight}{% %DIF PREAMBLE
\setlength{\unitlength}{\DIFdelgraphicswidth}% %DIF PREAMBLE
\begin{picture}(1,1)% %DIF PREAMBLE
\thicklines\linethickness{2pt} %DIF PREAMBLE
{\color[rgb]{1,0,0}\put(0,0){\framebox(1,1){}}}% %DIF PREAMBLE
{\color[rgb]{1,0,0}\put(0,0){\line( 1,1){1}}}% %DIF PREAMBLE
{\color[rgb]{1,0,0}\put(0,1){\line(1,-1){1}}}% %DIF PREAMBLE
\end{picture}% %DIF PREAMBLE
}\hspace*{3pt}}} %DIF PREAMBLE
} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddbegin}{\DIFaddbegin} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddend}{\DIFaddend} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelbegin}{\DIFdelbegin} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelend}{\DIFdelend} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddbegin}{\DIFOaddbegin \let\includegraphics\DIFaddincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddend}{\DIFOaddend \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelbegin}{\DIFOdelbegin \let\includegraphics\DIFdelincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelend}{\DIFOaddend \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddbeginFL}{\DIFaddbeginFL} %DIF PREAMBLE
\LetLtxMacro{\DIFOaddendFL}{\DIFaddendFL} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelbeginFL}{\DIFdelbeginFL} %DIF PREAMBLE
\LetLtxMacro{\DIFOdelendFL}{\DIFdelendFL} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddbeginFL}{\DIFOaddbeginFL \let\includegraphics\DIFaddincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFaddendFL}{\DIFOaddendFL \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelbeginFL}{\DIFOdelbeginFL \let\includegraphics\DIFdelincludegraphics} %DIF PREAMBLE
\DeclareRobustCommand{\DIFdelendFL}{\DIFOaddendFL \let\includegraphics\DIFOincludegraphics} %DIF PREAMBLE
%DIF LISTINGS PREAMBLE %DIF PREAMBLE
\lstdefinelanguage{codediff}{ %DIF PREAMBLE
moredelim=**[is][\color{red}]{*!----}{----!*}, %DIF PREAMBLE
moredelim=**[is][\color{blue}]{*!++++}{++++!*} %DIF PREAMBLE
} %DIF PREAMBLE
\lstdefinestyle{codediff}{ %DIF PREAMBLE
belowcaptionskip=.25\baselineskip, %DIF PREAMBLE
language=codediff, %DIF PREAMBLE
basicstyle=\ttfamily, %DIF PREAMBLE
columns=fullflexible, %DIF PREAMBLE
keepspaces=true, %DIF PREAMBLE
} %DIF PREAMBLE
%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF
\begin{document}
\title{Data-Driven Refactorings for Haskell}
\author{Stephen Adams}
\subject{Computer Science}
\degree{PhD}
\begin{preface}
\section{Abstract}
Refactoring is the process of changing the internal structure of a program without changing its external behaviour. The goal of performing refactorings is to increase code quality. However, refactoring by hand is time consuming and error-prone. This makes automated refactoring tools very useful.
Agile software development allows for software to evolve slowly over time. This evolution changes how a program processes, abstracts over, and views its data. This evolution, though necessary, comes with the cost of technical debt. As technical debt increases changes to a code base become more difficult. Refactoring is one of the primary ways to reduce technical debt.
There exist refactorings that specifically help software to evolve its data model, however these refactorings are specific to the object-oriented programming paradigm. Haskell is a strongly typed, pure functional programming language. Haskell's rich type system allows for complex and powerful data models and abstractions. This thesis reports on work done to design and automate refactorings that help Haskell programmers develop and evolve these abstractions.
This work also discussed the current design and implementation of HaRe (the \textit{Ha}skell \textit{Re}factorer). HaRe now supports the Glasgow Haskell Compiler's implementation of the Haskell 2010 standard and its extensions, and uses some of GHC's internal packages in its implementation.
\section{Acknowledgements}
First I would like to thank my supervisor Professor Simon Thompson, for his guidance and patience throughout this process. Without his help this thesis would not have been possible.
I also need to thank my parents, Jeff and Diane, for their support and encouragement. You both have been so excited for me to take this opportunity. It can't have been easy to have me live an ocean away, and your support in me started this project has been amazing.
A special thanks go out to Kristin Lamberty, Elena Machkasova, and Nic McPhee the professors of computer science at the University of Minnesota, Morris. All three of you introduced me to the world of computing and I wouldn't be here today without all of your help. You prepared me as much as anyone can be prepared for a PhD.
Alan Zimmerman is cited many times in this thesis but I don't think that is sufficient thanks for all the work he has done. Without you HaRe would not be where it is today. You dedication to the Haskell open source community is much appreciated.
I would like to extend my thanks to Adriana and Gaya Perera for letting me stay with them for the final year of my degree. You very graciously opened your house to me when I needed it and allowed me to stay much longer than any of us were expecting.
Finally I must thank Rosemary for her love and support. You have supported me through some of the toughest times of my life. Without you this would not have been finished, and without you this accomplishment would mean nothing.
\end{preface}
\chapter{Introduction}\label{chp:intro}
\section{Functional Programming}
Functional programming is a programming paradigm that focuses on data values as described by expressions which are built from function applications and definitions~\citep{elementsOfFunc}. Functions in this case are closely related to the idea of mathematical functions. What qualifies a programming language as functional is debatable but several concepts are often included in languages that are described as functional.
First class functions mean that functions are allowed to be treated like any other data type. First class functions can be passed to or returned from from other ``higher-order'' functions. Iteration is accomplished through recursion rather than via looping. There is also a heavy emphasis on functions remaining ``pure\DIFaddbegin \DIFadd{,}\DIFaddend '' that is without side effects\DIFdelbegin \DIFdel{, }\DIFdelend \DIFaddbegin \DIFadd{; }\DIFaddend though functional languages do provide ways to use IO or state\DIFaddbegin \DIFadd{, }\DIFaddend it is emphasised that functions should remain pure if at all possible. Haskell's type system allows for effectful computations to be contained within monad types.
\section{Haskell}
\label{haskell}
Haskell is a statically typed, lazily evaluated, pure functional language. Haskell \DIFdelbegin \DIFdel{is strongly and statically typed, and }\DIFdelend \DIFaddbegin \DIFadd{also }\DIFaddend supports Hindley-Milner type inference(\cite{hindley}\DIFdelbegin \DIFdel{,}\DIFdelend \DIFaddbegin \DIFadd{;}\DIFaddend \cite{milner}). Type inference means that \DIFdelbegin \DIFdel{a }\DIFdelend Haskell programs do not need \DIFdelbegin \DIFdel{every type }\DIFdelend \DIFaddbegin \DIFadd{the type of every top level binding }\DIFaddend to be explicitly \DIFdelbegin \DIFdel{listed in the source code}\DIFdelend \DIFaddbegin \DIFadd{provided by the programmer}\DIFaddend . Types will be~\textit{inferred} at compilation time so that every part of a Haskell program's type is known at that time. Haskell's type system also allows users to define \DIFaddbegin \DIFadd{and use }\DIFaddend their own types.
Lazy evaluation, also known as call-by-need~\citep{wadsworth}, means that Haskell expressions are not evaluated when they are passed as a parameter, but rather when that value is used. For example, in the Haskell function \texttt{f}, in \DIFdelbegin \DIFdel{figure}\DIFdelend \DIFaddbegin \DIFadd{Figure}\DIFaddend ~\ref{lazyY}, the parameter y will never be evaluated. Lazy evaluation is more nuanced than call-by-name style parameter passing. Composite data types will be evaluated only as much as required to allow the computation to continue. This allows for both partial and infinite data types, for example \texttt{[1..]} is \DIFdelbegin \DIFdel{a }\DIFdelend \DIFaddbegin \DIFadd{the }\DIFaddend list containing all of the natural numbers.
\begin{figure}[t]
\DIFdelbeginFL %DIFDELCMD < \begin{verbatim}%DIFDELCMD <
%DIFDELCMD < f x y = case x > 0 of
%DIFDELCMD < True -> x - 1
%DIFDELCMD < False -> x + 1
%DIFDELCMD < \end{verbatim}
%DIFDELCMD < %%%
\DIFdelendFL \DIFaddbeginFL \begin{lstlisting}
f x y = case x > 0 of
True -> x - 1
False -> x + 1
\end{lstlisting}
\DIFaddendFL \caption{A simple function}
\label{lazyY}
\end{figure}
Haskell is also a pure language. Purity is the idea that functions cannot perform actions in addition to returning values. These additional actions are known as side-effects. Haskell allows for traditionally side-effect causing operations (IO, state, etc.) through the use of monads. Monads represent computations, which when run will have effects as well as producing a result. A principle topic of this thesis is refactoring to support patterns of computation related to Monads and other related structures.
\DIFaddbegin \subsection{User defined types in Haskell}
\DIFadd{All high level programming languages come with some predefined set of types and, typically, some way for users to define new types. In Haskell new types can be introduced with the }\texttt{\DIFadd{data}} \DIFadd{keyword. Figure~\ref{simpleDataTy} shows a simple data type, }\texttt{\DIFadd{Choice}}\DIFadd{. There are only two different values a }\texttt{\DIFadd{Choice}} \DIFadd{can be, }\texttt{\DIFadd{Yes}} \DIFadd{or }\texttt{\DIFadd{No}}\DIFadd{, these are the }\textit{\DIFadd{constructors}} \DIFadd{of }\texttt{\DIFadd{Choice}}\DIFadd{.
}
\begin{figure}[t]
\begin{lstlisting}
data Choice = Yes | No
\end{lstlisting}
\caption{\DIFaddFL{A simple data type that models a yes/no choice.}}
\label{simpleDataTy}
\end{figure}
\DIFadd{Data types can also extend existing Haskell types. Returning to the }\texttt{\DIFadd{Choice}} \DIFadd{example maybe we want to be able to store some text along with a }\texttt{\DIFadd{Yes}} \DIFadd{to explain why this choice was made. This modified definition of }\texttt{\DIFadd{Choice}} \DIFadd{is shown in Figure~\ref{stringChoice}. Now a call to the }\texttt{\DIFadd{Yes}} \DIFadd{constructor needs to be passed some value of type }\texttt{\DIFadd{String}} \DIFadd{to construct a value of type }\texttt{\DIFadd{Choice}}\DIFadd{.
}
\begin{figure}[t]
\begin{lstlisting}
data Choice = Yes String | No
\end{lstlisting}
\caption{\texttt{\DIFaddFL{Yes}} \texttt{\DIFaddFL{Choice}}\DIFaddFL{s now contain a string as well.}}
\label{stringChoice}
\end{figure}
\DIFadd{Haskell allows you to take this one step further however. The creator of a type does not need to specify exactly which types will be stored within the new type, this means that types can be }\textit{\DIFadd{parameterized}}\DIFadd{. Figure~\ref{choiceParam} shows a version of }\texttt{\DIFadd{Choice}} \DIFadd{where the }\texttt{\DIFadd{Yes}} \DIFadd{choices can contain any type }\texttt{\DIFadd{a}}\DIFadd{. This is known as parametric polymorphism~\mbox{%DIFAUXCMD
\citep{haskellWikiPolymorphism}}\hspace{0pt}%DIFAUXCMD
. This type of polymorphism allows users to effectively reuse their types again and again without having to redefine them every time a new ``inner'' type is needed.
}
\begin{figure}[t]
\begin{lstlisting}
data Choice a = Yes a | No
\end{lstlisting}
\caption{\texttt{\DIFaddFL{Yes}} \texttt{\DIFaddFL{Choice}}\DIFaddFL{s can now hold any other type.}}
\label{choiceParam}
\end{figure}
\subsection{Type classes}
\DIFadd{Now that we have seen how users of Haskell can define their own data types, and how types can be defined in a generic way, making the definitions applicable to a whole set of arguments. Haskell is a functional language so a natural next question is to wonder that, if we can define types that are generic in some ways, can the same be said of functions? The answer is yes, this is what type classes do for Haskell.
}
\DIFadd{A type class allows a user to define a set of functions and/or constant names and their types. Then definitions of these functions and constants can be given for specific types. Figure~\ref{choiceEq} shows the type class for }\texttt{\DIFadd{Eq}} \DIFadd{which describes what it means for types to have a notion of equality. Figure~\ref{choiceEq} also shows how the definition of }\texttt{\DIFadd{Choice}} \DIFadd{from Figure~\ref{choiceParam} implements the }\texttt{\DIFadd{Eq}} \DIFadd{type class. The ``}\texttt{\DIFadd{(Eq a) =>}}\DIFadd{'' clause in the instance declaration is a constraint on the type of }\texttt{\DIFadd{a}}\DIFadd{. This means that for a }\texttt{\DIFadd{Choice}} \DIFadd{to be ``}\texttt{\DIFadd{Eq}}\DIFadd{-able'' }\texttt{\DIFadd{a}} \DIFadd{must be as well.
}
\begin{figure}[t]
\begin{lstlisting}
class Eq a where
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
instance (Eq a) => Eq (Choice a) where
Yes x == Yes y = x == y
No == No = True
_ == _ = False
c1 /= c2 = not (c1 == c2)
\end{lstlisting}
\caption{\DIFaddFL{The }\texttt{\DIFaddFL{Eq}} \DIFaddFL{type class and }\texttt{\DIFaddFL{Choice}}\DIFaddFL{'s implementation of it.}}
\label{choiceEq}
\end{figure}
\DIFadd{Haskell gives its users very powerful tools to help construct their own types and define those types' properties. These tools allow for many powerful concepts to be defined abstractly making them reusable. One of the more well known abstract concepts of the Haskell language are monads which are the subject of the next section.
}
\subsection{Monads}
\DIFadd{Monads are a type class used to represent ``composable computation descriptions''~\mbox{%DIFAUXCMD
\citep{haskellWikiMonad}}\hspace{0pt}%DIFAUXCMD
. Haskell uses monads to augment pure computations with features that other languages would allow as side effects such as state or I/O. Haskell has a built in monad type class, whose declaration is shown in Figure~\ref{monadTC}. Instances of monad need to implement two functions: }\texttt{\DIFadd{return}} \DIFadd{which will create a monadic computation that will produce its parameter and }\texttt{\DIFadd{>>=}} \DIFadd{(which is pronounced bind). Bind's first argument is some monadic value with a result of type }\texttt{\DIFadd{a}} \DIFadd{and its second argument is a function that takes in a value of type }\texttt{\DIFadd{a}} \DIFadd{and returns another monadic value that computes a result of type }\texttt{\DIFadd{b}}\DIFadd{. Bind computes the result of type }\texttt{\DIFadd{a}} \DIFadd{from the monadic value it was given in its first argument and passes that value to the function it received as its second argument. This description of bind only applies when its first argument is capable of computing some result or when the monad instance does not need to perform any additional tasks. This is the great power of monads as an abstraction, because bind can be implemented in almost any way}\footnote{\DIFadd{All instances of monad should abide by the three monad laws~\mbox{%DIFAUXCMD
\citep{wadler1995Monads}}\hspace{0pt}%DIFAUXCMD
}} \DIFadd{it can act as a way to specify how a particular set of computations are performed.
}
\DIFadd{The }\texttt{\DIFadd{Choice}} \DIFadd{type that was defined in the previous section is really just a copy of the }\texttt{\DIFadd{Maybe}} \DIFadd{type which is commonly used to represent computations that may fail or return a result. Instead of }\texttt{\DIFadd{Yes}} \DIFadd{and }\texttt{\DIFadd{No}} \texttt{\DIFadd{Maybe}}\DIFadd{'s constructors are }\texttt{\DIFadd{Just}} \DIFadd{and }\texttt{\DIFadd{Nothing}} \DIFadd{respectively. }\texttt{\DIFadd{Maybe}} \DIFadd{is a monad and in its case bind is used to cause a chain of computations to return }\texttt{\DIFadd{Nothing}} \DIFadd{if one of the steps of the computation fail. This allows for functions that could potentially fail to return a result to be composed without having to check if their parameter is a }\texttt{\DIFadd{null}} \DIFadd{value. Figure~\ref{maybeChain} shows this in practice. Each of the }\texttt{\DIFadd{f\_}} \DIFadd{functions can be written with the assumption that their parameter exists, only the top level function, }\texttt{\DIFadd{g}}\DIFadd{, needs to check if the }\texttt{\DIFadd{chain}} \DIFadd{function has failed.
}
\begin{figure}[t]
\begin{lstlisting}
class Applicative m => Monad m where
return :: a -> m a
(>>=) :: forall a b. m a -> (a -> m b) -> m b
\end{lstlisting}
\caption{\DIFaddFL{The }\texttt{\DIFaddFL{Monad}} \DIFaddFL{type class declaration}}
\label{monadTC}
\end{figure}
\begin{figure}[t]
\begin{lstlisting}
chain :: a -> Maybe a
chain a = f_1 a >>= f_2 >>= f_3 >>= f_4
g a = case (chain a) of
(Just b) -> putStrLn ("Got value: " ++ (show b))
Nothing -> putStrLn "Something went wrong"
\end{lstlisting}
\caption{\DIFaddFL{A chain of functions composed using bind. Each }\texttt{\DIFaddFL{f\_}} \DIFaddFL{function is of type }\texttt{\DIFaddFL{a -> Maybe a}}}
\label{maybeChain}
\end{figure}
\DIFadd{Monads are a powerful abstraction but it is also useful to further categorise monads that exhibit different properties. Haskell comes with type classes, that inherit from }\texttt{\DIFadd{Monad}}\DIFadd{, and that categorise additional features monads can have, such as failure (}\texttt{\DIFadd{MonadFail}}\DIFadd{) or failure and recovery (}\texttt{\DIFadd{MonadPlus}}\DIFadd{)~\mbox{%DIFAUXCMD
\citep{typeclassopedia}}\hspace{0pt}%DIFAUXCMD
. }\texttt{\DIFadd{MonadPlus}} \DIFadd{is a type class that represents the type of computations that may fail and also provide some way of choosing between possibly failed computations.
}
\DIFadd{In addition to the type classes that further categorise monads, monads themselves inherit from other more general types: functors and applicative functors. Functors are the set of types that can mapped over, in the case of lists the values can be transformed to another value by mapping some function over the list. Applicative functors are the set of types that can have sequential actions performed over them. Applicative functors will be discussed in more detail in Chapter~\ref{chp:applicative}.
}
\DIFadd{The Haskell community has built a large system of types around monad for representing all different sorts of computations. This system of types provides a fertile environment for data-driven refactorings, the topic of this thesis.
}
\DIFaddend \section{Refactoring}
Refactoring is the process of changing a program without changing its behaviour. This is done to improve its internal structure~\citep{fowler}. The term refactoring was \DIFdelbegin \DIFdel{coined in 1991 in}\DIFdelend \DIFaddbegin \DIFadd{first coined in}\DIFaddend ~\citep{programRestructuring} and these ideas expanded to the object-oriented paradigm in~\citep{refactOOFrameworks}. \DIFaddbegin \DIFadd{Martin Fowler's 1999 book ``Refactoring'' has become the canonical reference for object-oriented refactorings~\mbox{%DIFAUXCMD
\citep{fowler}}\hspace{0pt}%DIFAUXCMD
. }\DIFaddend People have been performing refactoring for much longer and it was called ``program restructuring'' in the literature(\cite{highSpeedRestructuring}\DIFdelbegin \DIFdel{, }\DIFdelend \DIFaddbegin \DIFadd{; }\DIFaddend \cite{performanceRestructuring}).
\DIFdelbegin \DIFdel{Martin Fowler's 1999 book ``Refactoring'' has become the canonical reference for object-oriented refactorings~\mbox{%DIFAUXCMD
\citep{fowler}}\hspace{0pt}%DIFAUXCMD
.
}\DIFdelend
Behaviour preservation is what separates refactoring from other types of program manipulation. This idea of \DIFdelbegin \DIFdel{functionality preservation means }\DIFdelend \DIFaddbegin \DIFadd{behaviour preservation is }\DIFaddend that refactoring will not introduce new bugs or eliminate old ones. To prevent semantic changes after refactoring, many refactorings have non-trivial preconditions~\citep{mens2002formalising}. For example, the \DIFdelbegin \DIFdel{renaming refactoring }\DIFdelend \DIFaddbegin \DIFadd{refactoring that renames an item (such as a function or variable) }\DIFaddend should check that the new name being introduced does not cause a name clash with a name already used in the source program.
Manual refactoring \DIFdelbegin \DIFdel{is }\DIFdelend \DIFaddbegin \DIFadd{can be }\DIFaddend tedious and error prone because changes to small portions of code may require system-wide changes. When deleting a parameter from a function, for example, every call site of that function also needs to be modified, and missing even a single call site will cause an error. Refactoring by hand depends on high testing coverage to ensure that functionality is preserved~\citep{fowler}. This means that tools that can automatically perform refactorings and ensure that preconditions are met are highly desirable.
\subsection{Functional refactoring}
Refactoring a functional language has a few key differences from refactoring an imperative language. The higher-order nature \DIFaddbegin \DIFadd{of }\DIFaddend functional languages means that any sub-expression of a function is a candidate for generalisation whereas in other languages the types of parameters and results \DIFdelbegin \DIFdel{is limited. }\DIFdelend \DIFaddbegin \DIFadd{are limited. This means that functional refactorings can target much more of the source program's abstract syntax tree. }\DIFaddend The semantics of functional languages also allow for more comprehensive checking of preconditions based on the static semantics of the language~\citep{refacTools}.
It is also not unusual for functional refactorings to be substantially different than their object-oriented (OO) counterparts. For example creating a \DIFdelbegin \DIFdel{case }\DIFdelend \DIFaddbegin \texttt{\DIFadd{case}} \DIFaddend statement from a multi-equation function definition in a functional language versus inlining \DIFdelbegin \DIFdel{a virtual method as a case }\DIFdelend \DIFaddbegin \DIFadd{virtual methods into a }\texttt{\DIFadd{case}} \DIFaddend statement in an OO language require substantially different program manipulations~\citep{huiqingThesis}. \DIFaddbegin \DIFadd{The first refactoring, the inlining of a multi-equation function, involves moving the pattern matches from each of the target function's equations to the left hand side patterns of the new }\texttt{\DIFadd{case}} \DIFadd{expression. The object oriented refactoring needs to look up all of the implementations of the virtual method, inline their bodies into the new target method, and build a case statement that switches depending on which concrete type the new method is called invoked in. Both of these refactorings are consolidating some conditional logic into a single function,}\footnote{\DIFadd{If we consider methods to be functions that happen to be associated with a particular object.}} \DIFadd{but the syntax elements that need to traversed and constructed to build the target program are very different. }\DIFaddend Additionally there can be refactorings with no OO counterpart, monadification the introduction of monadic types into otherwise pure code \DIFdelbegin \DIFdel{, }\DIFdelend for instance.
\section{Summary}
One of the most widely accepted best practices in software development is the concept of incremental change, an essential concept of both \DIFdelbegin \DIFdel{~\mbox{%DIFAUXCMD
\citep{agileManifesto} }\hspace{0pt}%DIFAUXCMD
and ~\mbox{%DIFAUXCMD
\citep{extremeProg}}\hspace{0pt}%DIFAUXCMD
}\DIFdelend \DIFaddbegin \DIFadd{the extreme programming and agile software development philosophies (\mbox{%DIFAUXCMD
\cite{extremeProg}}\hspace{0pt}%DIFAUXCMD
; \mbox{%DIFAUXCMD
\cite{agileManifesto}}\hspace{0pt}%DIFAUXCMD
)}\DIFaddend . Refactoring remains a key step in this incremental development process to preventing technical debt from causing development to slow to a crawl.
Much of the refactoring literature focuses on changes that need to be made to the structure of programs. The structure of a program is very important and structural refactorings can maintain the ``separation of concerns'' by extracting functions from existing definitions or ensuring that a \DIFdelbegin \DIFdel{programs }\DIFdelend \DIFaddbegin \DIFadd{program's }\DIFaddend names reflect what the program currently does. This thesis argues that it is just as important to maintain the ways that a program structures and evaluates the data it computes.
This thesis has chosen to use the term ``data-driven'' to describe the type of refactorings that are prompted by the data that a program computes. These refactorings can be prompted by an insufficiently fine-grained data model. For example, the ``introduce type synonym'' refactoring \DIFdelbegin \DIFdel{separates }\DIFdelend \DIFaddbegin \DIFadd{which creates a new type synonym that helps separate }\DIFaddend certain instances of a type that are used to represent different \DIFdelbegin \DIFdel{things}\DIFdelend \DIFaddbegin \DIFadd{concepts}\DIFaddend .
This thesis also takes advantage of Haskell's call-by-need evaluation strategy which allows for control flow to be abstracted by the user. The ``generalise monad to applicative refactoring'' (see Chapter~\ref{chp:applicative}) takes code that was formally monadic and sequentially evaluated and makes it possible to evaluate it in parallel. The ``generalise maybe'' refactorings from \DIFdelbegin \DIFdel{section~\ref{genMaybe} , }\DIFdelend \DIFaddbegin \DIFadd{Section~\ref{genMaybe} }\DIFaddend takes a concrete effect and makes it an abstract one that can be instantiated in multiple ways. Both of these refactorings are applied as either the programmer's understanding of the data they are working with becomes more nuanced or to prepare the program's data model for enhancement. In this way the data ``drives'' these refactorings.
\section{Thesis Outline}
This thesis will proceed as follows:
\DIFaddbegin \textbf{\DIFadd{Chapter~\ref{chp:related}: Related work}}
\DIFadd{This thesis begins with chapter that discusses the some of the related work that this thesis builds from and is inspired by.
}
%DIF > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\DIFadd{This chapter covers other refactoring tools for other functional and object-oriented languages. This chapter also discusses multiple projects that are using refactoring in unique and interesting ways such as introducing parallelism, and non-traditional programming languages. Next the chapter describes work on developing syntactic sugar for applicative functors. Finally there is a brief discussion of type changing program transformation systems.
}
\DIFaddend \textbf{Chapter~\ref{chp:hare}: Background: Refactoring Haskell in HaRe}
This chapter covers the history of HaRe \DIFaddbegin \DIFadd{(the }\textbf{\DIFadd{Ha}}\DIFadd{skell }\textbf{\DIFadd{Re}}\DIFadd{factorer)}\DIFaddend , the technologies it depends on, and the current implementation of HaRe. Specifically there is an overview of the\DIFdelbegin \DIFdel{GHC API}\DIFdelend ~\citep{ghcApi}, the generic traversal library Scrap Your Boilerplate~\citep{syb}, and ghc-exactprint~\citep{exactprint}. This chapter also describes how the inner workings of HaRe are implemented and the functions that compose its API. Finally it discuses the design and development process of implementing refactorings for HaRe.
\textbf{Chapter~\ref{chp:ddRefs}: Data-Driven Refactorings}
Chapter~\ref{chp:ddRefs} introduces data-driven refactorings. The chapter begins with a discussion of this type of refactoring for object-oriented languages. The rest of the chapter describes data-driven refactorings for the Haskell programming language. First there is a description of the ``introduce a type synonym\DIFaddbegin \DIFadd{,}\DIFaddend '' the ``renaming'' refactoring of data-driven refactorings. The \DIFaddbegin \DIFadd{renaming refactoring is the most straightforward refactoring, it simply changes the name of a variable to one that better suits the real meaning of the variable. Similarly a type synonym can rename types to better reflect what certain instances of that type are being used for in a program, and this refactoring supports that transformation. The }\DIFaddend other two refactorings covered in this chapter are the ``generalising maybe'' and ``list to \DIFdelbegin \DIFdel{hughes }\DIFdelend \DIFaddbegin \DIFadd{Hughes }\DIFaddend list'' refactorings.
The ``generalising maybe'' refactoring \DIFaddbegin \DIFadd{(Section~\ref{genMaybe}) }\DIFaddend rewrites functions that use the concrete type of \texttt{Maybe} to \DIFdelbegin \DIFdel{instead use}\DIFdelend \DIFaddbegin \DIFadd{use, instead, }\DIFaddend the operations provided by the \DIFdelbegin \DIFdel{typeclasses it implements}\DIFdelend \DIFaddbegin \DIFadd{type classes it implements, }\DIFaddend \texttt{Monad} and/or \texttt{MonadPlus}. The final refactoring described by this chapter is the ``list to Hughes list'' refactoring \DIFaddbegin \DIFadd{(Section~\ref{listToDlist})}\DIFaddend . Hughes lists (also known as difference lists) are an alternative implementation for lists that support $O(n)$ time appends. This refactoring takes functions that use that standard list implementation and rewrites the function to use \DIFdelbegin \DIFdel{Hughe }\DIFdelend \DIFaddbegin \DIFadd{Hughes }\DIFaddend lists instead. The approach for this refactoring is applicable between any two types that are ``reversibly embeddable'' a concept that will be defined as well.
\textbf{Chapter~\DIFdelbegin \DIFdel{\ref{generalImp}}\DIFdelend \DIFaddbegin \DIFadd{\ref{chp:generalImp}}\DIFaddend : Implementing Data-Driven Refactorings in HaRe}
This chapter continues \DIFdelbegin \DIFdel{on }\DIFdelend from the refactoring designs presented in chapter~\ref{chp:ddRefs} to describe HaRe's implementation of both the ``\DIFdelbegin \DIFdel{Maybe }\DIFdelend \DIFaddbegin \DIFadd{maybe }\DIFaddend to MonadPlus'' and the ``\DIFdelbegin \DIFdel{List }\DIFdelend \DIFaddbegin \DIFadd{list }\DIFaddend to Hughes List'' refactorings. There is also be a discussion of the API that supports the ``\DIFdelbegin \DIFdel{List }\DIFdelend \DIFaddbegin \DIFadd{list }\DIFaddend to Hughes List'' refactoring and can be used to define further ``reversibly embeddable'' type refactorings. Finally this chapter will describe the enhancements made to HaRe's API, specifically the addition of \DIFdelbegin \DIFdel{high level }\DIFdelend \DIFaddbegin \DIFadd{high-level }\DIFaddend transformation functions.
\textbf{Chapter~\ref{chp:applicative}: Generalising Monads to Applicative}
This chapter presents another generalisation refactoring. Applicative functors are a, relatively, new addition to the Haskell environment. Applicative functors are an interface for sequencing effectful computations. Currently the Haskell community predominantly uses the monadic interface for effects. This chapter will describe the design and implementation of a refactoring for taking a monadic \texttt{do} statement and transforming it to use the \texttt{Applicative} (the Haskell \DIFdelbegin \DIFdel{typeclass }\DIFdelend \DIFaddbegin \DIFadd{type class }\DIFaddend for Applicative functors) interface instead.
This chapter also describes how the Haskell community is currently using the \texttt{Applicative} interface, based on the results of a survey of Hackage\DIFaddbegin \DIFadd{, }\DIFaddend the Haskell package archive~\citep{hackage}. Finally this chapter concludes with a discussion of possible applications of the refactoring.
\textbf{Chapter~\ref{chp:monadification}: Introducing Monads}
Chapter~\ref{chp:monadification} describes the monadification refactoring. Monadification is the process of introducing monads into pure code. Monads are the standard way for effects to be used in Haskell and so a refactoring to automatically add them is very useful to the community. There are many styles of monadification and this chapter describes several of them. Finally it discusses the implementation of the monadification refactoring in HaRe.
\DIFdelbegin \textbf{\DIFdel{Chapter~\ref{chp:related}: Related work}}
%DIFAUXCMD
%DIFDELCMD <
%DIFDELCMD < %%%
\DIFdel{The final content chapter discusses the literature related to this thesis. This chapter covers other refactoring tools for other functional and object-oriented languages. This chapter also discusses multiple projects that are using refactoring in unique and interesting ways such as introducing parallelism, and non-traditional programming languages. Next the chapter describes work on developing syntactic sugar for applicative functors. Finally there is a brief discussion of type changing program transformation systems.
}%DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend \textbf{Chapter~\ref{chp:conc}: Conclusion}
The final chapter summarises the work done for this thesis and the contributions it has made. It concludes with a discussion of future work that could be performed on HaRe.
\section{Contributions of this Research}
The work in this thesis was carried out in HaRe\DIFdelbegin \DIFdel{(the }\textbf{\DIFdel{Ha}}%DIFAUXCMD
\DIFdel{skell }\textbf{\DIFdel{Re}}%DIFAUXCMD
\DIFdel{factorer)}\DIFdelend . This study focused on adding additional refactorings to HaRe of a new type. Rather than being motivated by the structural problems of a program\DIFaddbegin \DIFadd{, }\DIFaddend data-driven refactorings seek to resolve issues that are caused by the data types a program uses. The contributions of this research are:
\begin{itemize}
\item Extending the HaRe API to better support data-driven refactorings. These refactorings are complex and require more information from the abstract syntax of GHC than prior refactorings in HaRe. The contributions to the API were focused on analysis of the types of nodes and creating a higher level interface for common small expression level transformations. These changes are described in \DIFdelbegin \textbf{\DIFdel{chapter~\ref{generalImp}}}%DIFAUXCMD
\DIFdelend \DIFaddbegin \DIFadd{Chapter~\ref{chp:generalImp}}\DIFaddend .
\item The design and implementation of the ``generalise maybe'' and ``list to Hughes list'' refactoring, described in \DIFdelbegin \DIFdel{chapters}\DIFdelend \DIFaddbegin \DIFadd{Chapters}\DIFaddend ~\ref{chp:ddRefs} and ~\DIFdelbegin \DIFdel{\ref{generalImp}}\DIFdelend \DIFaddbegin \DIFadd{\ref{chp:generalImp}}\DIFaddend . The ``generalise maybe'' refactoring is a way of taking a concrete effect and turning it into an abstract one so that it can be instantiated in multiple ways. The ``list to Hughes list'' refactoring describes a way to rewrite a program's data model to use a different type with a similar interface.
\item The design and implementation of the ``\DIFdelbegin \DIFdel{Generalise Monads to Applicative}\DIFdelend \DIFaddbegin \DIFadd{generalise monads to applicative}\DIFaddend '' refactoring, \DIFdelbegin \DIFdel{chapter}\DIFdelend \DIFaddbegin \DIFadd{Chapter}\DIFaddend ~\ref{chp:applicative}. Effects in Haskell are typically handled using monads, a powerful abstraction that allows the type system to check the type of the effects that are being performed by a program. Since the introduction of monads the Haskell community has developed more fine-grained approaches to effect handling. This refactoring allows software systems to handle effects using the applicative as opposed to the monadic interface.
\item The \DIFdelbegin \DIFdel{design and }\DIFdelend implementation of the ``monadification'' refactoring, see \DIFdelbegin \DIFdel{chapter}\DIFdelend \DIFaddbegin \DIFadd{Chapter}\DIFaddend ~\ref{chp:monadification}. Monadification is the process of making a program work over a monadic type rather than a pure one. This refactoring supports the \DIFdelbegin \DIFdel{evolution }\DIFdelend \DIFaddbegin \DIFadd{transformation }\DIFaddend of pure programs to effectful ones.
\end{itemize}
%DIF > %%%%%%%%%%%%%%%%%%%%%%%%%
\DIFaddbegin
%DIF > %%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{\DIFadd{Related Work}}
\label{chp:related}
\DIFadd{This chapter reviews current work in refactoring and other areas of the literature and helps to situate the work described in this thesis into a broader landscape. This chapter begins, in Section~\ref{ideTools}, with a short description of refactoring tools, including the sophisticated modern IDEs that support object-oriented languages and the refactoring tools that support functional languages.
}
\DIFadd{Next Sections~\ref{refacParallel} and~\ref{applicativeDo} will describe some of the work done to transform programs so that they are executed in parallel rather than sequentially. The original idea behind refactoring was to reduce the technical debt of a target program. Technical debt is the idea that when a system is being implemented there can be quick and messy ways to implement things, but this implementation will incur ``debt'' on the project. Much like financial debt some debt is needed to get a project off the ground but also like financial debt if debt is allowed to grow unchecked it can grind a project to a halt~\mbox{%DIFAUXCMD
\citep{techDebt}}\hspace{0pt}%DIFAUXCMD
. Refactoring, was traditionally the process of ``paying back'' this debt, however the potential scope of refactorings has been expanded to include, for example, introducing parallelism rather than just improving code quality.
}
\DIFadd{Section~\ref{typeTrans} covers work done in the program transformation and refactoring fields. In particular earlier versions of monadification are discussed, such as ``Reuse by Program Transformation''~\mbox{%DIFAUXCMD
\citep{lammelReuse}}\hspace{0pt}%DIFAUXCMD
, and ``Monadification of Functional Programs''~\mbox{%DIFAUXCMD
\citep{monadification}}\hspace{0pt}%DIFAUXCMD
. Other work that is covered in this section includes work on refactoring and program transformation that is particularly focused on types.
}
\DIFadd{The final section of this chapter discusses how refactoring tools are implemented. This section pays special attention to refactoring tools that target languages other than Haskell, because the focus of Chapter~\ref{chp:hare} is on building refactoring tools for Haskell.
}
\section{Refactoring Tools in Modern IDEs}\label{ideTools}
\DIFadd{Refactoring tools have become a standard feature in integrated development environments. The four most popular IDEs for object-oriented languages, Eclipse}\footnote{\url{https://www.eclipse.org/}}\DIFadd{, NetBeans}\footnote{\url{https://netbeans.org/}}\DIFadd{, IntelliJ}\footnote{\url{https://www.jetbrains.com/idea/}}\DIFadd{, and Visual Studio}\footnote{\url{https://www.visualstudio.com/}} \DIFadd{all come with refactoring tools for their primary language~\mbox{%DIFAUXCMD
\citep{ides}}\hspace{0pt}%DIFAUXCMD
. These refactoring tools support some general refactorings (renaming, method extraction) and some that are specific to object-oriented languages (pushing/pulling methods up/down the object hierarchy).
}
\DIFadd{Modern IDEs, however, were not designed with functional programming in mind. Eclipse, Netbeans, and Intellij were all built to support Java and Visual Studio supports multiple languages most prominently C++ and C\#.}\footnote{\DIFadd{Visual Studio also includes support for F\# out of the box, but this could be considered the exception that proves the rule.}} \DIFadd{These tools allow community developed plugins to expand the languages they can support but the plugin's can fall out of date, for example the Haskell Eclipse plugin was stopped being supported in 2015.}\footnote{\DIFadd{See: }\url{https://wiki.haskell.org/IDEs\#EclipseFP_plugin_for_Eclipse_IDE}} \DIFadd{Though modern IDEs are heavily depended on in object-oriented development functional programming languages have developed their own, mostly independent tooling ecosystems. This will be the focus of the next section, refactoring tools for functional programming languages.
}
\section{Refactoring Tools for functional languages}\label{funcTools}
\DIFadd{A reason that often used to be given to explain why functional languages are not in widespread use in industry is the lack of a robust tooling ecosystem~\mbox{%DIFAUXCMD
\citep{wadlerTools}}\hspace{0pt}%DIFAUXCMD
. This is no longer the case as functional language ecosystems have undergone a great deal of development in recent years and, maybe coincidently, use in industry has gone up substantially in the last five years. This section will cover some existing refactoring and code smell tools for functional programming languages.
}
\subsection{HLint}
\DIFadd{HLint is a ``code smell'' tool for Haskell. Poorly designed code often produces ``smells,'' apparently superficial problems that indicate deeper design issues~\mbox{%DIFAUXCMD
\citep{fowler}}\hspace{0pt}%DIFAUXCMD
. One of the most common of these smells is duplicated code. Other hints include simplifying boolean expressions to remove unneeded calls to }\texttt{\DIFadd{not}} \DIFadd{(e.g. ``}\texttt{\DIFadd{not (a == b)}}\DIFadd{'' should become ``}\texttt{\DIFadd{(a /= b)''}}\DIFadd{), or replacing common types of folds with their prelude defined names (e.g. }\texttt{\DIFadd{sum}} \DIFadd{can replace }\texttt{\DIFadd{foldr (+) 0}}\DIFadd{). A code smell tool suggests changes to a code base such as alternative functions to use, how to simplify code, and redundancies~\mbox{%DIFAUXCMD
\citep{hlint}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{Code smell and refactoring tools are very closely related. Simplistically a code smell tool detects problems in a code base and a refactoring tool fixes them. If a tool can detect a problem why can't the same program fix them? HLint has a }\texttt{\DIFadd{-refactor}} \DIFadd{flag that will automatically apply the suggestions. However there can be a problem in doing this, e.g. a single piece of code could have multiple smells, how would HLint choose which one to apply? Also once a transformation has been applied other hints may no longer be applicable. HLint's behaviour in these cases is not documented.
}
\DIFadd{One of the powerful features of HLint is its customizability. An HLint configuration file (called }\texttt{\DIFadd{hlint.yaml}}\DIFadd{) added to the root of a project will be detected by HLint and it will suggest both the default hints as well as the custom hints from that file. Hints are very simple to write.
}
\begin{figure}[t]
\begin{lstlisting}
- hint: {lhs: x !! 0, rhs: head x}
\end{lstlisting}
\caption{\DIFaddFL{A simple hint from~\mbox{%DIFAUXCMD
\citep{hlint}}\hspace{0pt}%DIFAUXCMD
}}
\label{lstHint}
\end{figure}
\DIFadd{Figure~\ref{lstHint} contains the definition of a hint that detects the list index operator is being used to look up the 0th element of a list and suggests using }\texttt{\DIFadd{head}} \DIFadd{instead. The }\texttt{\DIFadd{lhs}} \DIFadd{tag is the code HLint will search for. If code matching }\texttt{\DIFadd{lhs}} \DIFadd{is found HLint will suggest the code be replaced with the }\texttt{\DIFadd{rhs}} \DIFadd{code. HLint assumes any single character variable is a substitution parameter. Given the hint from Figure~\ref{lstHint} and the following code:
}
\begin{lstlisting}
f list = list !! 0
\end{lstlisting}
\DIFadd{HLint produces the following output.
}
\begin{lstlisting}
Suggestion: Use head
Found:
list !! 0
Why not:
head list
1 hint
\end{lstlisting}
\DIFadd{A major limitation of HLint is that it is only aware of a single module at a time. HLint is not aware of what types or names are in scope and code smells that span multiple modules cannot be discovered.
}
\subsection{Haskell Tools Refact}
\DIFadd{HaRe is not the only refactoring tool for Haskell. In late 2016 Haskell Tools Refact was announced and is currently at version 0.7~\mbox{%DIFAUXCMD
\citep{haskellTools}}\hspace{0pt}%DIFAUXCMD
. The Haskell Tools project is a GHC based developer tool kit for writing transformations~\mbox{%DIFAUXCMD
\citep{haskellToolsGit}}\hspace{0pt}%DIFAUXCMD
. There are eight refactorings currently supported}\footnote{\DIFadd{May 2018}}\DIFadd{.
}
\begin{itemize}
\item \DIFadd{Rename
}\item \DIFadd{Generate type signature
}\item \DIFadd{Generate exports
}\item \DIFadd{Extract binding
}\item \DIFadd{Inline binding
}\item \DIFadd{Organize imports
}\item \DIFadd{Float out
}\item \DIFadd{Organize extensions
}\end{itemize}
\DIFadd{Haskell Tools has implemented its own abstract syntax tree. The AST of Haskell Tools is generated using information from all of GHC's compiler stages. Each node represents the same language elements; it just includes additional information that is spread across the different stages of the GHC~\mbox{%DIFAUXCMD
\citep{haskellTools}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{The Haskell Tools refactorer is currently integrated into the Atom editor}\footnote{\url{https://atom.io/}} \DIFadd{with Sublime Text}\footnote{\url{https://www.sublimetext.com}} \DIFadd{support planned for the near future~\mbox{%DIFAUXCMD
\citep{haskellTools}
}\hspace{0pt}%DIFAUXCMD
}
\subsection{Wrangler}\label{wranglerOne}
\DIFadd{Wrangler is a refactoring and code inspection tool for Erlang~\mbox{%DIFAUXCMD
\citep{wrangler}}\hspace{0pt}%DIFAUXCMD
. Erlang is a functional programming language designed to be massively scalable and highly fault tolerant~\mbox{%DIFAUXCMD
\citep{erlang}}\hspace{0pt}%DIFAUXCMD
. It was originally developed in 1986 by Joe Armstrong, Robert Virding, and Mike Williams at the Computer Science Laboratory at Ericsson Telecom AB~\mbox{%DIFAUXCMD
\citep{erlangHistory}}\hspace{0pt}%DIFAUXCMD
. Erlang's core design tenets include lightweight processes, that communicate through message passing. Erlang also boasts a ``let it fail" error handling architecture, when a process fails the error is not handled by the process where it occurred, but is handled by a separate dedicated part of the program~\mbox{%DIFAUXCMD
\citep{armstrongThesis}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{Wrangler is accessible from the command line and has been integrated into both Emacs and Eclipse. It currently supports a large library of refactorings, code smells, as well as other program analysis tools such as clone detection and automatic API migration~\mbox{%DIFAUXCMD
\citep{wrangler}}\hspace{0pt}%DIFAUXCMD
. Additionally Wrangler supports a template-based API and a domain specific language which allow users to define their own refactorings and script composite refactorings~\mbox{%DIFAUXCMD
\citep{wranglerDomain}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{The template-based API of Wrangler allows users to define program analyses and transformations using Erlang concrete syntax. Wrangler templates consist of fragments of Erlang syntax that may contain meta-variables or meta-atoms that can stand for any language element. Meta-variables/atoms are variables or atoms that end with the "}\texttt{}\DIFadd{" character; this meta-variable/atom binds a language element to its name so that that element can be referred to by name in the definition of the refactoring. Meta-variables/atoms that end with "}\texttt{}\DIFadd{" are list meta-variables/atoms that match a sequence of language elements as long as they are of same sort~\mbox{%DIFAUXCMD
\citep{letsUser}}\hspace{0pt}%DIFAUXCMD
.}\footnote{\DIFadd{Things like the arguments to a function or a sequence of expressions in a function body are the same "sort."}}
\begin{figure}[t]
\begin{lstlisting}[language=erlang]
?T("erlang:spawn(Arg@)")
?T("erlang:spawn(Arg@@)")
\end{lstlisting}
\caption{\DIFaddFL{Some Wrangler templates}}
\label{templates}
\end{figure}
\DIFadd{The first template in Figure~\ref{templates} matches applications of }\texttt{\DIFadd{erlang:spawn}} \DIFadd{when it is called with one argument whereas the second template will match the same function with any number of arguments.
}
\DIFadd{Composite refactorings are refactorings that are made up of multiple refactorings run, in sequence, one after the other. It can be challenging to develop composite refactorings if they are not explicitly handled by the refactoring tool. The naive solution just chains refactorings together with the output from one refactoring in a composite refactoring becoming the input to the next refactoring. However, what if the second refactoring fails in a chain of four? Composite refactoring definitions, without tool support, become filled with error handling code to manage the situation when one of the component refactorings fail. Wrangler defines a domain specific language that helps describe the various facets of a composite refactoring.
}
\DIFadd{The Wrangler DSL supports the creation of a composite refactoring through a variety of features. First, Wrangler extends every primitive refactoring with a }\textit{\DIFadd{refactoring command generator}}\DIFadd{. A command generator allows the extended refactoring to accept not just concrete values but also structures that specify how the parameter should be generated; each parameter of a command generator accepts either a concrete value, a condition that checks if a value is satisfactory, or a generator for creating the parameter based on the previous parameters. A refactoring for renaming functions named with the format }\texttt{\DIFadd{camelCase}} \DIFadd{to }\texttt{\DIFadd{camel\_case}} \DIFadd{would accept three arguments: the target filename, the name of the target function, and the desired new name. This command generator's first parameter is a condition that always returns true because any file is a valid target for renaming. The second parameter is another condition that checks if the function name matches is in camel case format (e.g. "}\texttt{\DIFadd{aFunName}}\DIFadd{"). The final parameter is generated by taking the second parameter and modifying it so that the name is in the corresponding "snake case" format (e.g. "}\texttt{\DIFadd{a\_fun\_name}}\DIFadd{").
}
\DIFadd{The DSL also allows decision making to occur during the execution of a composite refactoring. Composite refactorings are transactional and can be either atomic or non-atomic. Atomic composite refactorings require each component refactoring to be successfully applied before continuing onto the next refactoring. If a single refactoring fails inside of an atomic composite refactoring, the entire refactoring fails and the program remains unchanged. When a single refactoring fails inside a non-atomic composite refactoring, correspondingly, the entire refactoring will not fail and continue by trying the next refactoring in the sequence. The Wrangler DSL allows for refactorings to described as atomic and non-atomic sections at each level.
}
\subsection{RefactorErl}\label{refactorErl}
\DIFadd{Another notable tool for the Erlang language is RefactorErl. RefactorErl started out as another refactoring tool for Erlang but has since expanded into a source code analysis and transformation tool~\mbox{%DIFAUXCMD
\citep{refactorErl}}\hspace{0pt}%DIFAUXCMD
. RefactorErl uses a ``semantic program graph'' to represent an Erlang source program, this graph is broken up into three layers~\mbox{%DIFAUXCMD
\citep{erlangStatic}}\hspace{0pt}%DIFAUXCMD
:
}
\begin{enumerate}
\item \DIFadd{Lexical layer: This is where token, spacing, and comment information is kept about the source program.
}\item \DIFadd{Syntactic layer: This layer keeps the abstract syntax tree of the source program.
}\item \DIFadd{Semantic layer: This contains additional calculated semantic information about the source program, such as module and function references as well as variable bindings.
}\end{enumerate}
\DIFadd{Refactoring tools, in general, only require the information contained within these first two layers, the semantic layer helps RefactorErl implement its static analysis capabilities. The semantic program graph is constructed after a source program's abstract syntax has been obtained from the Erlang language front end. The semantic layer is built on top of the abstract syntax tree by several different static analysers that each add a different kind of information to the graph. For example, the function analyser adds a semantic function node when the first reference to or the definition of a function is found. Every reference to that function discovered after that points to the original node~\mbox{%DIFAUXCMD
\citep{erlangStatic}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{The edges in the semantic layer are also labeled so the relationships between nodes can be captured as well. The function analyser, for instance, labels the edges going from the semantic function node to internal references to that function with a }\textit{\DIFadd{funIref}} \DIFadd{label, and the }\textit{\DIFadd{fundef}} \DIFadd{label connects the semantic node to the function definition it represents. The semantic layer is critical for the static analysis RefactorErl performs but it is also quite useful for the refactorings as well. Consider the ``move a function'' refactoring}\footnote{\DIFadd{See: }\url{http://pnyf.inf.elte.hu/trac/refactorerl/wiki/RefactoringSteps/MoveFunction}}\DIFadd{, which moves a function definition from one module to another, every reference in the target function needs to be checked to ensure that those definitions are available in the new scope of the function. Without a semantic layer a refactoring tool may have to traverse the syntax tree multiple times to locate all the references the target function contains and ensure they are available in the function's new location. With the semantic layer, on the other hand, finding all of the references can be done in two steps over the graph (one step from the reference to the semantic node, the next step following the semantic node's ``definition'' edge, e.g. }\textit{\DIFadd{fundef}}\DIFadd{, }\textit{\DIFadd{fielddef}}\DIFadd{, etc.).
}
\DIFadd{RefactorErl currently supports 24 different refactorings as well as a suite of static analysis and program comprehension tools.
}
\subsection{ROTOR}
\DIFadd{A newcomer to the refactoring tools for functional language space is ROTOR}\footnote{\textbf{\DIFadd{R}}\DIFadd{eliable }\textbf{\DIFadd{O}}\DIFadd{Caml-base }\textbf{\DIFadd{T}}\DIFadd{ool for }\textbf{\DIFadd{O}}\DIFadd{Caml }\textbf{\DIFadd{R}}\DIFadd{efactoring}} \DIFadd{the first refactoring tool to target OCaml~\mbox{%DIFAUXCMD
\citep{rotor}}\hspace{0pt}%DIFAUXCMD
. Language features of OCaml provide some unique challenges for a refactoring tool. In OCaml one module may be included in another so that, for example, when renaming the function }\texttt{\DIFadd{f}} \DIFadd{in module }\texttt{\DIFadd{A}} \DIFadd{but }\texttt{\DIFadd{A}} \DIFadd{is included in module }\texttt{\DIFadd{B}} \DIFadd{then both }\texttt{\DIFadd{A.f}} \DIFadd{and }\texttt{\DIFadd{B.f}} \DIFadd{will need to be renamed. ROTOR handles this by decomposing refactorings into a set of textual replacement operations that can depend on other transformations~\mbox{%DIFAUXCMD
\citep{rotor}}\hspace{0pt}%DIFAUXCMD
. From the renaming example renaming ``}\texttt{\DIFadd{A.f}}\DIFadd{'' would depend on the ``}\texttt{\DIFadd{B.f}}\DIFadd{'' renaming succeeding and vice versa.
}
\DIFadd{An opportunity of the ROTOR project is that it has an partner in industry, Jane Street Capital. ROTOR is using the core library}\footnote{\url{https://github.com/janestreet/core}} \DIFadd{an "industrial strength" version of the OCaml standard library as a test bed for testing the refactoring tool.
}
\section{Refactoring to introduce parallelism}\label{refacParallel}
\DIFadd{The reasons to refactor source code have also expanded beyond code quality. This section will describe two different projects that have developed refactorings to change the execution of a program from single to multi-threaded. Functional programming languages are well suited to parallel execution due to immutability by default and in some languages (such as Erlang) first-class concurrency features. This section will first describe the ``ParaForming'' which uses refactoring to introduce parallel abstractions into Haskell code, then it will describe work done to refactor Erlang code to introduce algorithmic skeletons.
}
\subsection{ParaForming}
\DIFadd{ParaForming is an approach to construct parallel programs from an existing program using software refactoring~\mbox{%DIFAUXCMD
\citep{paraforming}}\hspace{0pt}%DIFAUXCMD
. The ParaForming work targets Glasgow parallel Haskell: ``GpH'', an extension to Haskell, and is implemented in HaRe. Parallelism is added to programs in GpH using strategies (see Figure~\ref{strategy}).
}
\begin{figure}[t]
\begin{lstlisting}
type Strategy a = a -> Eval a
\end{lstlisting}
\caption{\DIFaddFL{The strategy type}}
\label{strategy}
\end{figure}
\DIFadd{A strategy takes its argument and determines how it will be evaluated inside of the }\texttt{\DIFadd{Eval}} \DIFadd{monad. The }\texttt{\DIFadd{rpar}} \DIFadd{strategy introduces parallelism by "sparking" its argument. Sparks are tasks that are collected into a pool which is managed by the runtime. The spark pool is a source of work that GHC can pull from when there are idle processors. Sparks may be evaluated in parallel or not at all depending on the availability of spare cores.
}
\DIFadd{The simplest parallel refactoring is to introduce data parallelism. This refactoring is applied to an expression that works over a list and evaluates each member of that list in a spark. A sequential function that sums the Euler totient function is in Figure~\ref{eulerSeq} and the refactored program is in Figure~\ref{eulerPar1}
}
\begin{figure}[t]
\begin{lstlisting}
sumEulerSeq :: Int -> Int
sumEulerSeq n = sum (map euler (mkList n))
\end{lstlisting}
\caption{\DIFaddFL{A sequential calculation that sums the Euler totient function}}
\label{eulerSeq}
\end{figure}
\begin{figure}[t]
\begin{lstlisting}
sumEulerPar1 :: Int -> Int
sumEulerPar1 n = sum (map euler (mkList n) `using` parList rdeepseq)
\end{lstlisting}
\caption{\DIFaddFL{A refactored version of the function from Figure~\ref{eulerSeq}}}
\label{eulerPar1}
\end{figure}
\DIFadd{This refactoring evaluates the calculation of }\texttt{\DIFadd{map euler (mkList n)}} \DIFadd{using}\footnote{\DIFadd{The (}\texttt{\DIFadd{using :: a -> Strategy a -> a}}\DIFadd{) function just evaluates some expression with the given strategy.}} \DIFadd{the }\texttt{\DIFadd{parList rdeepseq}} \DIFadd{strategy. The }\texttt{\DIFadd{parlist}}\footnote{\texttt{\DIFadd{parlist :: Strategy a -> Strategy }[\DIFadd{a}]}} \DIFadd{function evaluates each element of a list in parallel according to a given strategy and }\texttt{\DIFadd{rdeepseq}} \DIFadd{is the strategy the fully evaluates its argument.
}
\DIFadd{The refactored program in Figure~\ref{eulerPar1} is highly parallel but not very efficient because the parallelism is too fine grained. Another refactoring can help in this case instead of sparking every element of a list another strategy can be introduced, one that separates the list into "chunks" and each of the chunks of the list is executed in parallel. This refactoring adds an additional argument to the function that determines how many chunks the list will be split into, as seen in Figure~\ref{eulerChunk}.
}
\begin{figure}[t]
\begin{lstlisting}
sumEulerChunk :: Int -> Int -> Int
sumEulerChunk c n = sum (map euler (mkList n) `using` parListChunk c rdeepseq)
\end{lstlisting}
\caption{\DIFaddFL{A "chunked" version of the function from Figure~\ref{eulerSeq}}}
\label{eulerChunk}
\end{figure}
\DIFadd{These two refactorings are both a way of introducing data parallelism with varying degrees of granularity. The other form of parallelism is known as task parallelism. Where data parallelism is focused on computing different parts of a data structure in parallel (the elements of a list in the previous case), task parallelism instead focuses on having different ``tasks'' excecuted in parallel. The work done in~\mbox{%DIFAUXCMD
\cite{paraforming} }\hspace{0pt}%DIFAUXCMD
outlines a refactoring that can make recursive calls happen in parallel.
}
\subsection{Cost-Directed Parallel Refactoring}
\DIFadd{The previous section touched on one of the big challenges of parallel programming, determining the correct level of parallelism to achieve maximum performance. \mbox{%DIFAUXCMD
\cite{parallelErl} }\hspace{0pt}%DIFAUXCMD
describes a methodology to introduce algorithmic skeletons into Erlang programs using the Erlang refactoring tool Wrangler. In addition to introducing a skeleton this work provides cost models that estimate the performance of the program after adding each skeleton. This estimate helps a programmer to make an informed decision about which parallelisation strategy is the best for a particular program.
}
\DIFadd{An algorithmic skeleton is a common parallel pattern. A skeleton is implemented as a higher-order function that takes in a sequential function and any parameters that the skeleton requires. \mbox{%DIFAUXCMD
\cite{parallelErl} }\hspace{0pt}%DIFAUXCMD
discusses the four most common and useful skeletons. For example, the map skeleton works by breaking up the target data into pieces that can be operated on in parallel. Finally the results from the the parallel computations are combined back into a single image. One of the examples presented in~\mbox{%DIFAUXCMD
\cite{parallelErl} }\hspace{0pt}%DIFAUXCMD
is an image processing system that denoises images. Denoising a section of an image can be done independently from processing the other sections of the same image. The introducing the map skeleton would break the image into pieces to be denoised in parallel then the outputted sections can be stitched back together again
}
\DIFadd{Skeletons are simple to understand in theory but it can be difficult to know which to apply in practice. This is when the cost models of each skeleton become useful to help make an informed decision about which skeleton should cause the greatest speed up and this information can be used to guide the refactoring. In~\mbox{%DIFAUXCMD
\cite{parallelErl} }\hspace{0pt}%DIFAUXCMD
an initial benchmark of the program can be used to estimate the speed up that different skeletons could provide.
}
\DIFadd{Parallelisation can be tedious and difficult to do which makes it a good candidate for tool assistance. A refactoring tool can guide a programmer through the process of parallelisation. Much like how the data-driven refactorings have multiple small changes are required before the entire process can be considered ``finished'' changing a program to run in parallel is also a sequence of several smaller changes.
}
\section{ApplicativeDo}\label{applicativeDo}
\DIFadd{Haskell is a pure functional programming language. Purity in this context means that Haskell is side-effect free. This causes some confusion because in most other languages side-effects are allowed, if not the primary way that programs produce their ``result.'' Haskell instead models side-effect causing computations using Monads, which take effects of computation that are typically implicit and make them an explicit result of the program instead. Beyond modeling side effect causing operations, monads allow for computations to be supplemented with additional features~\mbox{%DIFAUXCMD
\citep{haskellWikiMonad}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{Haskell supports the writing of monadic code through the ``}\texttt{\DIFadd{do}}\DIFadd{'' syntax sugar, an example of this syntax can be seen in Figure~\ref{doF}. Monads and their }\texttt{\DIFadd{do}} \DIFadd{syntax have become a commonly accepted pattern for handling effects within the Haskell community.
}
\DIFadd{Applicative functors are another type class that describe computations performed within some context, but are less powerful than monads. They were first described by~\mbox{%DIFAUXCMD
\cite{mcbrideIdioms}}\hspace{0pt}%DIFAUXCMD
, and a fairly recent change to GHC made the }\texttt{\DIFadd{Applicative}} \DIFadd{type class}\footnote{\texttt{\DIFadd{Applicative}} \DIFadd{is what the GHC calls the type class that implements applicative functors.}} \DIFadd{a superclass of }\texttt{\DIFadd{Monad}} \DIFadd{which means that every instance of }\texttt{\DIFadd{Monad}} \DIFadd{now must also implement the }\texttt{\DIFadd{Applicative}} \DIFadd{interface as well.
}
\DIFadd{Compiler changes can't force a community to change its practices and }\texttt{\DIFadd{Applicative}} \DIFadd{remains under-utilised compared to }\texttt{\DIFadd{Monad}}\DIFadd{. This under-utilisation of applicative functors in Haskell has not gone unnoticed; in \mbox{%DIFAUXCMD
\cite{applicativeDo}}\hspace{0pt}%DIFAUXCMD
, an implementation of a language extension for GHC was introduced that changes the way Haskell interprets }\texttt{\DIFadd{do}} \DIFadd{statements so that applicative functors can be supported by the same }\texttt{\DIFadd{do}} \DIFadd{syntactic sugar. }\texttt{\DIFadd{Applicative}}\DIFadd{'s offer a key advantage over monads, when applicative functors are composed the results they calculate remain independent of each other. This means that values under applicative functors can be evaluated in parallel. Using the familiar }\texttt{\DIFadd{do}} \DIFadd{syntactic sugar to also support }\texttt{\DIFadd{Applicative}}\DIFadd{'s means that programs can become concurrent ``for free,'' this is the main motivation behind the applicative-do work~\mbox{%DIFAUXCMD
\citep{applicativeDo}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{Figure~\ref{doF} shows a simple function constructed using Haskell's do notation. The standard way that this function would be desugared is shown in Figure~\ref{fDesugar}. Finally Figure~\ref{fApDoDesugar} shows how the same function would be desugared when the }\texttt{\DIFadd{ApplicativeDo}} \DIFadd{language extension is turned on.
}
\begin{figure}[t]
\begin{lstlisting}
f = do
x1 <- A
x2 <- B x1
x3 <- C
return (x2,x3)
\end{lstlisting}
\caption{\DIFaddFL{A simple monadic function constructed using a }\texttt{\DIFaddFL{do}} \DIFaddFL{statement.}}
\label{doF}
\end{figure}
\begin{figure}[t]
\begin{lstlisting}
f = A >>=
(\x1 -> B x1 >>=
(\x2 -> C >>=
(\x3 -> return (x2,x3))))
\end{lstlisting}
\caption{\DIFaddFL{The desugared version of }\texttt{\DIFaddFL{f}} \DIFaddFL{from Figure~\ref{doF}.}}
\label{fDesugar}
\end{figure}
\begin{figure}[t]
\begin{lstlisting}
f = (\x2 x3 -> (x2, x3))
<$> (A >>= (\x1 -> B x1))
<*> C
\end{lstlisting}
\caption{\DIFaddFL{How }\texttt{\DIFaddFL{f}} \DIFaddFL{will be desugared when applicative do is turned on.}}
\label{fApDoDesugar}
\end{figure}
\DIFadd{The }\textit{\DIFadd{ApplicativeDo}} \DIFadd{algorithm will attempt to insert as many applies into the expression as possible. When the implementation of the }\texttt{\DIFadd{Applicative}} \DIFadd{instance evaluates the two arguments of apply in parallel, better performance can be achieved by adding more applies.
}
\DIFadd{There could be multiple ways to desugar a particular function. The }\textit{\DIFadd{applicativeDo}} \DIFadd{algorithm first assumes that every expression has an identical time cost, and from this assumption the algorithm heuristically determines the desugaring with the shortest execution time.
}
\section{Program transformations}\label{typeTrans}
\DIFadd{Refactoring is a type of program transformation but it does not constitute the whole field. A major difference between refactoring and other types of program transformations is that a refactoring must take into account the human readability of its output. Program transformations typically focus on just the algorithm whereas refactorings must take into account the broader effects a transformation has on a codebase and the context that programs exist in. Additionally the target program of a refactoring needs to be readable, maintainable, and keep proper layout and user comments. Other types of program transformation don't typically have these concerns. This section will describe some of the program transformation work most relevant to this thesis. First it will describe the type and transform system developed by~\mbox{%DIFAUXCMD
\citep{typeAndTransformSemantics}}\hspace{0pt}%DIFAUXCMD
. Next there will be a discussion of the previous methods of monadification found in the literature.
}
\subsection{Type and transform systems}
\DIFadd{The type-and-transform system described in~\mbox{%DIFAUXCMD
\citep{typeAndTransformSemantics} }\hspace{0pt}%DIFAUXCMD
is a system for a semantics preserving and type changing program transformations over the typed lambda calculus with let polymorphism. The type-and-transform system is limited to isomorphic types, there must be a way to convert between the two types and back again as described in Figure~\ref{transformIso}.
}
\begin{figure}[t]
\begin{lstlisting}
rep :: A -> R
abs :: R -> A
rep . abs = id
abs . rep = id
\end{lstlisting}
\caption{\DIFaddFL{The properties that must hold for the type-and-transform system to work over types }\texttt{\DIFaddFL{A}} \DIFaddFL{and }\texttt{\DIFaddFL{R}}}
\label{transformIso}
\end{figure}
\DIFadd{The type-and-transform system supports type-changing rewrites through typed rewrite rules that insert conversions between the source and target types as appropriate. To handle the fact that there are multiple ways to retype a program each rewrite rule is weighted to maximize the use of the target type, introduce the target type as soon as possible in the program, and delay the conversion back to the source type as late as possible.
}
\DIFadd{This work emphasises formalisation and its correctness and the work is done in the context of the lambda calculus rather than a full programming language. There is a Haskell implementation of their system but it is only a prototype though they state that they want to expand this work to work with Haskell however this has not been published yet.
}
\subsection{Automatic Monadification}\label{erwigMonad}
\DIFadd{Monadification is not a new problem and various solutions have been presented in the literature. In~\mbox{%DIFAUXCMD
\citep{lammelReuse} }\hspace{0pt}%DIFAUXCMD
monadification is performed in two steps. First the program is transformed into A-normal form}\footnote{\DIFadd{This is also known as sequencing}}\DIFadd{, which flattens applications into let expressions. The first line of Figure~\ref{anormal} shows a normal expression and line 3 of the same Figure shows that expression in A-normal form.
}
\begin{figure}[t]
\begin{lstlisting}
f (g x) (h y)
let x1 = g x in
let x2 = h y in
f x1 x2
\end{lstlisting}
\caption{\DIFaddFL{A-normal form converstion}}
\label{anormal}
\end{figure}
\DIFadd{Once the program has been converted into A-normal form, a let expression of the form:
}
\DIFadd{$ let x = t1 in t2 $
}
\DIFadd{Is transformed into:
}
\DIFadd{$ t1 >>= \lambda x. t2 $
}
\DIFadd{If the right hand side of the lambda is not already a monadic type then }\texttt{\DIFadd{return}} \DIFadd{will be introduced, e.g. $ t1 >>= \lambda x. return~~t2 $. The full transformation is given by inference rules in~\mbox{%DIFAUXCMD
\citep{lammelReuse}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{Monadification is developed further by~\mbox{%DIFAUXCMD
\citep{monadification}}\hspace{0pt}%DIFAUXCMD
. This work provides an algorithm for restricted call-by-value monadification as opposed to the semantics style inference rules defined in~\mbox{%DIFAUXCMD
\citep{lammelReuse}}\hspace{0pt}%DIFAUXCMD
. This work targets the lambda calculus extended with case and let expressions. The algorithm from~\mbox{%DIFAUXCMD
\citep{monadification} }\hspace{0pt}%DIFAUXCMD
is very similar to the one implemented in HaRe. It has the same precondition that every call to a monadified function must be fully saturated, which means that every call site of a target function needs all of its parameters to be named variables, and it produces the same style of monadification as the implementation provided in HaRe. A prototype implementation of this method was produced as a part of~\mbox{%DIFAUXCMD
\citep{monadification}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{The idea of providing refactoring as a monadification was first discussed in~\mbox{%DIFAUXCMD
\cite{monadSurvey}}\hspace{0pt}%DIFAUXCMD
. This source provides a comprehensive discussion of the different styles of monadification. There are several different, equally ``correct,'' ways to monadify a function. A refactoring tool, perhaps more than other types of program transformations, needs to consider what its users intentions and desires are for applying the transformation so that the most useful output can be produced. This is particularly important for refactoring tools because its output needs to be directly usable by developers. This makes the discussion of what monadification style to be very relevant since the monadified functions need to be easily usable. This issue of style is continued in Chapter~\ref{chp:monadification}.
}
\subsection{Data Type Transformations}\label{dtt}
\DIFadd{Data types are obviously the focus of data-driven refactorings and a considerable amount of work has been focused on them in the program transformation and refactoring literature.
}
\DIFadd{In~\mbox{%DIFAUXCMD
\cite{dataTypeFramework} }\hspace{0pt}%DIFAUXCMD
the authors present a set of transformation primitives that help modify data types, either through writing scripts or via a GUI in an interactive mode. This framework can be used to define refactorings but its creators implemented operators that can be used for behaviour changing transformations as well, which take the framework's capabilities beyond refactoring. This framework, for example, supports the insertion and deletion of constructor components~\mbox{%DIFAUXCMD
\citep{dataTypeFramework}}\hspace{0pt}%DIFAUXCMD
.
}
\DIFadd{This framework created a set of transformation operators that work over Haskell but the design of the framework is general enough to be implemented in other langauges that also support algebraic data types.