-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathCHANGES.txt
7352 lines (5075 loc) · 288 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Hadoop Change Log
Release 0.19.1 - 2009-02-23
INCOMPATIBLE CHANGES
HADOOP-5225. Workaround for tmp file handling in HDFS. sync() is
incomplete as a result. committed only to 0.19.x. (Raghu Angadi)
HADOOP-5224. HDFS append() is disabled. It throws
UnsupportedOperationException. committed only to 0.19.x (Raghu Angadi)
IMPROVEMENTS
HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
mapred tutorial. (Vivek Ratan via cdouglas)
HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
BUG FIXES
HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
blocks correctly. (Sriram Rao via cdouglas)
HADOOP-4420. Add null checks for job, caused by invalid job IDs.
(Aaron Kimball via tomwhite)
HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
current workding directory for scratch space. (Amar Kamat via cdouglas)
HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
szetszwo)
HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
fuse-dfs. (Brian Bockelman via szetszwo)
HADOOP-4731. Fix capacity scheduler to correctly remove job on completion
from waiting queue. (Amar Kamat via yhemanth)
HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
via szetszwo)
HADOOP-4821. Usage description in the Quotas guide documentations are
incorrect. (Boris Shkolnik via hairong)
HADOOP-4847. Moves the loading of OutputCommitter to the Task.
(Amareshwari Sriramadasu via ddas)
HADOOP-4966. Marks completed setup tasks for removal.
(Amareshwari Sriramadasu via ddas)
HADOOP-4992. Fixes a package name problem introduced by HADOOP-4847.
(Amareshwari Sriramadasu via ddas)
HADOOP-4982. TestFsck should run in Eclipse. (shv)
HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
fd unclosed. (hairong)
HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
(Zheng Shao via dhruba).
HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
tasks. (ddas)
HADOOP-5002. Fixes a problem to do with the order of initialization of
reduce task and instantiating the reducer class.
(Amareshwari Sriramadasu via ddas)
HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
verification log unclosed. (hairong)
HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
HADOOP-4955. Make DBOutputFormat us column names from setOutput().
(Kevin Peterson via enis)
HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
spurious IOExceptions logged by DataNode. (Raghu Angadi)
HADOOP-5034. NameNode should send both replication and deletion requests
to DataNode in one reply to a heartbeat. (hairong)
HADOOP-5156. TestHeartbeatHandling uses MiiDFSCluster.getNamesystem()
which does not exit in branch 0.19 and 0.20. (hairong)
HADOOP-4759. HADOOP-4759. Removes temporary output directory for failed and
killed tasks by launching special CLEANUP tasks for the same.
(Amareshwari Sriramadasu via ddas)
HADOOP-5161. Accepted sockets do not get placed in
DataXceiverServer#childSockets. (hairong)
HADOOP-5193. Correct calculation of edits modification time. (shv)
HADOOP-4494. Allow libhdfs to append to files.
(Pete Wyckoff via dhruba)
HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
for the JobTracker. (Amar Kamat via yhemanth).
HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
killed tasks correctly. (Amareshwari Sriramadasu via ddas)
HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
(enis)
Release 0.19.0 - 2008-11-18
INCOMPATIBLE CHANGES
HADOOP-3595. Remove deprecated methods for mapred.combine.once
functionality, which was necessary to providing backwards
compatible combiner semantics for 0.18. (cdouglas via omalley)
HADOOP-3667. Remove the following deprecated methods from JobConf:
addInputPath(Path)
getInputPaths()
getMapOutputCompressionType()
getOutputPath()
getSystemDir()
setInputPath(Path)
setMapOutputCompressionType(CompressionType style)
setOutputPath(Path)
(Amareshwari Sriramadasu via omalley)
HADOOP-3652. Remove deprecated class OutputFormatBase.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-2885. Break the hadoop.dfs package into separate packages under
hadoop.hdfs that reflect whether they are client, server, protocol,
etc. DistributedFileSystem and DFSClient have moved and are now
considered package private. (Sanjay Radia via omalley)
HADOOP-2325. Require Java 6. (cutting)
HADOOP-372. Add support for multiple input paths with a different
InputFormat and Mapper for each path. (Chris Smith via tomwhite)
HADOOP-1700. Support appending to file in HDFS. (dhruba)
HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
zero for true and non-zero for false. (Ben Slusky via cdouglas)
HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
which is no longer needed. (tomwhite via omalley)
HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
EACCES is returned for permission problems. (Ben Slusky via omalley)
HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
so increment the InterTrackerProtocol version. (Hemanth Yamijala via
omalley)
HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
committing output files. Moves job setup to jobclient, and moves jobcleanup
to a separate task. (Amareshwari Sriramadasu via ddas)
HADOOP-3446. Keep map outputs in memory during the reduce. Remove
fs.inmemory.size.mb and replace with properties defining in memory map
output retention during the shuffle and reduce relative to maximum heap
usage. (cdouglas)
HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
jobs can be recovered from the history file. The history file format has
been modified to support recovery. The task attempt ID now has the
JobTracker start time to disinguish attempts of the same TIP across
restarts. (Amar Ramesh Kamat via ddas)
HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
(Sanjay Radia via hairong)
HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
HADOOP-2816. Cluster summary at name node web reports the space
utilization as:
Configured Capacity: capacity of all the data directories - Reserved space
Present Capacity: Space available for dfs,i.e. remaining+used space
DFS Used%: DFS used space/Present Capacity
(Suresh Srinivas via hairong)
HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
quotas in 0.18. (rangadi)
HADOOP-4293. Make Configuration Writable and remove unreleased
WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
HADOOP-4281. Change dfsadmin to report available disk space in a format
consistent with the web interface as defined in HADOOP-2816. (Suresh
Srinivas via cdouglas)
HADOOP-4430. Further change the cluster summary at name node web that was
changed in HADOOP-2816:
Non DFS Used - This indicates the disk space taken by non DFS file from
the Configured capacity
DFS Used % - DFS Used % of Configured Capacity
DFS Remaining % - Remaing % Configured Capacity available for DFS use
DFS command line report reflects the same change. Config parameter
dfs.datanode.du.pct is no longer used and is removed from the
hadoop-default.xml. (Suresh Srinivas via hairong)
HADOOP-4116. Balancer should provide better resource management. (hairong)
NEW FEATURES
HADOOP-3341. Allow streaming jobs to specify the field separator for map
and reduce input and output. The new configuration values are:
stream.map.input.field.separator
stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
All of them default to "\t". (Zheng Shao via omalley)
HADOOP-3479. Defines the configuration file for the resource manager in
Hadoop. You can configure various parameters related to scheduling, such
as queues and queue properties here. The properties for a queue follow a
naming convention,such as, hadoop.rm.queue.queue-name.property-name.
(Hemanth Yamijala via ddas)
HADOOP-3149. Adds a way in which map/reducetasks can create multiple
outputs. (Alejandro Abdelnur via ddas)
HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
bash tab completion for the bin/hadoop script. See the README file
in the contrib directory for the installation. (Chris Smith via enis)
HADOOP-3730. Adds a new JobConf constructor that disables loading
default configurations. (Alejandro Abdelnur via ddas)
HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
(Ari Rabkin via acmurthy)
HADOOP-2302. Provides a comparator for numerical sorting of key fields.
(ddas)
HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
HADOOP-657. Free disk space should be modelled and used by the scheduler
to make scheduling decisions. (Ari Rabkin via omalley)
HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
and Eric Yang)
HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
number of files/bytes copied in a particular run to support incremental
updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3585. FailMon package for hardware failure monitoring and
analysis of anomalies. (Ioannis Koltsidas via dhruba)
HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3759. Provides ability to run memory intensive jobs without
affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
HADOOP-3828. Provides a way to write skipped records to DFS.
(Sharad Agarwal via ddas)
HADOOP-3948. Separate name-node edits and fsimage directories.
(Lohit Vijayarenu via shv)
HADOOP-3939. Add an option to DistCp to delete files at the destination
not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
(Craig Weisenfluh via omalley)
HADOOP-3698. Add access control to control who is allowed to submit or
modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
HADOOP-1869. Support access times for HDFS files. (dhruba)
HADOOP-3941. Extend FileSystem API to return file-checksums.
(szetszwo)
HADOOP-3581. Prevents memory intensive user tasks from taking down
nodes. (Vinod K V via ddas)
HADOOP-3970. Provides a way to recover counters written to JobHistory.
(Amar Kamat via ddas)
HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
chains of Maps and Reduces in a single Map/Reduce job, something like
MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
queues as a percentage of the cluster. (Vivek Ratan via omalley)
HADOOP-3992. Add a synthetic load generation facility to the test
directory. (hairong via szetszwo)
HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
and change DistCp to use file checksum for comparing src and dst files
(szetszwo)
HADOOP-3829. Narrown down skipped records based on user acceptable value.
(Sharad Agarwal via ddas)
HADOOP-3930. Add common interfaces for the pluggable schedulers and the
cli & gui clients. (Sreekanth Ramakrishnan via omalley)
HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
Configuration changes to hadoop-default.xml:
add mapred.job.reuse.jvm.num.tasks
(Devaraj Das via acmurthy)
HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
query language. (tomwhite)
HADOOP-2536. Implement a JDBC based database input and output formats to
allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
Enis Soztutar via acmurthy)
HADOOP-3019. A new library to support total order partitions.
(cdouglas via omalley)
HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
acmurthy)
IMPROVEMENTS
HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
(zshao)
HADOOP-4106. libhdfs: add time, permission and user attribute support
(part 2). (Pete Wyckoff through zshao)
HADOOP-4104. libhdfs: add time, permission and user attribute support.
(Pete Wyckoff through zshao)
HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
(Pete Wyckoff through zshao)
HADOOP-3732. Delay intialization of datanode block verification till
the verification thread is started. (rangadi)
HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
(rangadi)
HADOOP-3577. Tools to inject blocks into name node and simulated
data nodes for testing. (Sanjay Radia via hairong)
HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. (cdouglas via omalley)
HADOOP-3655. Add additional ant properties to control junit. (Steve
Loughran via omalley)
HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
HADOOP-3587. Add a unit test for the contrib/data_join framework.
(cdouglas)
HADOOP-3402. Add terasort example program (omalley)
HADOOP-3660. Add replication factor for injecting blocks in simulated
datanodes. (Sanjay Radia via cdouglas)
HADOOP-3684. Add a cloning function to the contrib/data_join framework
permitting users to define a more efficient method for cloning values from
the reduce than serialization/deserialization. (Runping Qi via cdouglas)
HADOOP-3478. Improves the handling of map output fetching. Now the
randomization is by the hosts (and not the map outputs themselves).
(Jothi Padmanabhan via ddas)
HADOOP-3617. Removed redundant checks of accounting space in MapTask and
makes the spill thread persistent so as to avoid creating a new one for
each spill. (Chris Douglas via acmurthy)
HADOOP-3412. Factor the scheduler out of the JobTracker and make
it pluggable. (Tom White and Brice Arnould via omalley)
HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
hadoop-default.xml. (rangadi)
HADOOP-3747. Adds counter suport for MultipleOutputs.
(Alejandro Abdelnur via ddas)
HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
constructor. (TszWo (Nicholas), SZE via hairong)
HADOOP-3824. Move base functionality of StatusHttpServer to a core
package. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
may be processed by map/reduce. (Abdul Qadeer via cdouglas)
HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
(tomwhite via omalley)
HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
cdouglas)
HADOOP-3694. Improve unit test performance by changing
MiniDFSCluster to listen only on 127.0.0.1. (cutting)
HADOOP-3620. Namenode should synchronously resolve a datanode's network
location when the datanode registers. (hairong)
HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
benchmarks. (shv)
HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
HADOOP-3875. Change the time period between heartbeats to be relative to
the end of the heartbeat rpc, rather than the start. This causes better
behavior if the JobTracker is overloaded. (acmurthy via omalley)
HADOOP-3853. Move multiple input format (HADOOP-372) extension to
library package. (tomwhite via johan)
HADOOP-9. Use roulette scheduling for temporary space when the size
is not known. (Ari Rabkin via omalley)
HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
(Amareshwari Sriramadasu via omalley)
HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
via omalley)
HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
via omalley)
HADOOP-3780. Remove asynchronous resolution of network topology in the
JobTracker (Amar Kamat via omalley)
HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
error messages. (Steve Loughran via omalley)
HADOOP-3844. Include message of local exception in RPC client failures.
(Steve Loughran via omalley)
HADOOP-3935. Split out inner classes from DataNode.java. (johan)
HADOOP-3905. Create generic interfaces for edit log streams. (shv)
HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling. (cdouglas)
HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
generative javadoc for developers. (Sanjay Radia via omalley)
HADOOP-3944. Improve documentation for public TupleWritable class in
join package. (Chris Douglas via enis)
HADOOP-2330. Preallocate HDFS transaction log to improve performance.
(dhruba and hairong)
HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
Groshupf and Craig Macdonald via omalley)
HADOOP-3342. Change the kill task actions to require http post instead of
get to prevent accidental crawls from triggering it. (enis via omalley)
HADOOP-3937. Limit the job name in the job history filename to 50
characters. (Matei Zaharia via omalley)
HADOOP-3943. Remove unnecessary synchronization in
NetworkTopology.pseudoSortByDistance. (hairong via omalley)
HADOOP-3498. File globbing alternation should be able to span path
components. (tomwhite)
HADOOP-3361. Implement renames for NativeS3FileSystem.
(Albert Chern via tomwhite)
HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
unset. (Al Hoang via tomwhite)
HADOOP-4147. Remove unused class JobWithTaskContext from class
JobInProgress. (Amareshwari Sriramadasu via johan)
HADOOP-4151. Add a byte-comparable interface that both Text and
BytesWritable implement. (cdouglas via omalley)
HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
NamenodeProtocol. (shv via szetszwo)
HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
developing under git. (omalley)
HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
omalley)
HADOOP-4184. Break the module dependencies between core, hdfs, and
mapred. (tomwhite via omalley)
HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
(Ramya R via nigel)
HADOOP-4117. Improve configurability of Hadoop EC2 instances.
(tomwhite)
HADOOP-2411. Add support for larger CPU EC2 instance types.
(Chris K Wensel via tomwhite)
HADOOP-4083. Changed the configuration attribute queue.name to
mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
HADOOP-4194. Added the JobConf and JobID to job-related methods in
JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
HADOOP-3975. Change test-patch script to report working the dir
modifications preventing the suite from being run. (Ramya R via cdouglas)
HADOOP-4124. Added a command-line switch to allow users to set job
priorities, also allow it to be manipulated via the web-ui. (Hemanth
Yamijala via acmurthy)
HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
userlogs. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4062. Remove the synchronization on the output stream when a
connection is closed and also remove an undesirable exception when
a client is stoped while there is no pending RPC request. (hairong)
HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
(szetszwo)
HADOOP-4006. Clean up FSConstants and move some of the constants to
better places. (Sanjay Radia via rangadi)
HADOOP-4279. Trace the seeds of random sequences in append unit tests to
make itermitant failures reproducible. (szetszwo via cdouglas)
HADOOP-4209. Remove the change to the format of task attempt id by
incrementing the task attempt numbers by 1000 when the job restarts.
(Amar Kamat via omalley)
HADOOP-4301. Adds forrest doc for the skip bad records feature.
(Sharad Agarwal via ddas)
HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
(szetszwo)
HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
Yamijala via omalley)
HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
via cdouglas).
HADOOP-4438. Update forrest documentation to include missing FsShell
commands. (Suresh Srinivas via cdouglas)
HADOOP-4105. Add forrest documentation for libhdfs.
(Pete Wyckoff via cutting)
HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
OPTIMIZATIONS
HADOOP-3556. Removed lock contention in MD5Hash by changing the
singleton MessageDigester by an instance per Thread using
ThreadLocal. (Iv?n de Prado via omalley)
HADOOP-3328. When client is writing data to DFS, only the last
datanode in the pipeline needs to verify the checksum. Saves around
30% CPU on intermediate datanodes. (rangadi)
HADOOP-3863. Use a thread-local string encoder rather than a static one
that is protected by a lock. (acmurthy via omalley)
HADOOP-3864. Prevent the JobTracker from locking up when a job is being
initialized. (acmurthy via omalley)
HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
HADOOP-2130. Pipes submit job should have both blocking and non-blocking
versions. (acmurthy via omalley)
HADOOP-3769. Make the SampleMapper and SampleReducer from
GenericMRLoadGenerator public, so they can be used in other contexts.
(Lingyun Yang via omalley)
HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
it from a different .crc file. (Jothi Padmanabhan via ddas)
HADOOP-3638. Caches the iFile index files in memory to reduce seeks
(Jothi Padmanabhan via ddas)
HADOOP-4225. FSEditLog.logOpenFile() should persist accessTime
rather than modificationTime. (shv)
HADOOP-4380. Made several new classes (Child, JVMId,
JobTrackerInstrumentation, QueueManager, ResourceEstimator,
TaskTrackerInstrumentation, and TaskTrackerMetricsInst) in
org.apache.hadoop.mapred package private instead of public. (omalley)
BUG FIXES
HADOOP-3563. Refactor the distributed upgrade code so that it is
easier to identify datanode and namenode related code. (dhruba)
HADOOP-3640. Fix the read method in the NativeS3InputStream. (tomwhite via
omalley)
HADOOP-3711. Fixes the Streaming input parsing to properly find the
separator. (Amareshwari Sriramadasu via ddas)
HADOOP-3725. Prevent TestMiniMRMapDebugScript from swallowing exceptions.
(Steve Loughran via cdouglas)
HADOOP-3726. Throw exceptions from TestCLI setup and teardown instead of
swallowing them. (Steve Loughran via cdouglas)
HADOOP-3721. Refactor CompositeRecordReader and related mapred.join classes
to make them clearer. (cdouglas)
HADOOP-3720. Re-read the config file when dfsadmin -refreshNodes is invoked
so dfs.hosts and dfs.hosts.exclude are observed. (lohit vijayarenu via
cdouglas)
HADOOP-3485. Allow writing to files over fuse.
(Pete Wyckoff via dhruba)
HADOOP-3723. The flags to the libhdfs.create call can be treated as
a bitmask. (Pete Wyckoff via dhruba)
HADOOP-3643. Filter out completed tasks when asking for running tasks in
the JobTracker web/ui. (Amar Kamat via omalley)
HADOOP-3777. Ensure that Lzo compressors/decompressors correctly handle the
case where native libraries aren't available. (Chris Douglas via acmurthy)
HADOOP-3728. Fix SleepJob so that it doesn't depend on temporary files,
this ensures we can now run more than one instance of SleepJob
simultaneously. (Chris Douglas via acmurthy)
HADOOP-3795. Fix saving image files on Namenode with different checkpoint
stamps. (Lohit Vijayarenu via mahadev)
HADOOP-3624. Improving createeditslog to create tree directory structure.
(Lohit Vijayarenu via mahadev)
HADOOP-3778. DFSInputStream.seek() did not retry in case of some errors.
(LN via rangadi)
HADOOP-3661. The handling of moving files deleted through fuse-dfs to
Trash made similar to the behaviour from dfs shell.
(Pete Wyckoff via dhruba)
HADOOP-3819. Unset LANG and LC_CTYPE in saveVersion.sh to make it
compatible with non-English locales. (Rong-En Fan via cdouglas)
HADOOP-3848. Cache calls to getSystemDir in the TaskTracker instead of
calling it for each task start. (acmurthy via omalley)
HADOOP-3131. Fix reduce progress reporting for compressed intermediate
data. (Matei Zaharia via acmurthy)
HADOOP-3796. fuse-dfs configuration is implemented as file system
mount options. (Pete Wyckoff via dhruba)
HADOOP-3836. Fix TestMultipleOutputs to correctly clean up. (Alejandro
Abdelnur via acmurthy)
HADOOP-3805. Improve fuse-dfs write performance.
(Pete Wyckoff via zshao)
HADOOP-3846. Fix unit test CreateEditsLog to generate paths correctly.
(Lohit Vjayarenu via cdouglas)
HADOOP-3904. Fix unit tests using the old dfs package name.
(TszWo (Nicholas), SZE via johan)
HADOOP-3319. Fix some HOD error messages to go stderr instead of
stdout. (Vinod Kumar Vavilapalli via omalley)
HADOOP-3907. Move INodeDirectoryWithQuota to its own .java file.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3919. Fix attribute name in hadoop-default for
mapred.jobtracker.instrumentation. (Ari Rabkin via omalley)
HADOOP-3903. Change the package name for the servlets to be hdfs instead of
dfs. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3773. Change Pipes to set the default map output key and value
types correctly. (Koji Noguchi via omalley)
HADOOP-3952. Fix compilation error in TestDataJoin referencing dfs package.
(omalley)
HADOOP-3951. Fix package name for FSNamesystem logs and modify other
hard-coded Logs to use the class name. (cdouglas)
HADOOP-3889. Improve error reporting from HftpFileSystem, handling in
DistCp. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3946. Fix TestMapRed after hadoop-3664. (tomwhite via omalley)
HADOOP-3949. Remove duplicate jars from Chukwa. (Jerome Boulon via omalley)
HADOOP-3933. DataNode sometimes sends up to io.byte.per.checksum bytes
more than required to client. (Ning Li via rangadi)
HADOOP-3962. Shell command "fs -count" should support paths with different
file systems. (Tsz Wo (Nicholas), SZE via mahadev)
HADOOP-3957. Fix javac warnings in DistCp and TestCopyFiles. (Tsz Wo
(Nicholas), SZE via cdouglas)
HADOOP-3958. Fix TestMapRed to check the success of test-job. (omalley via
acmurthy)
HADOOP-3985. Fix TestHDFSServerPorts to use random ports. (Hairong Kuang
via omalley)
HADOOP-3964. Fix javadoc warnings introduced by FailMon. (dhruba)
HADOOP-3785. Fix FileSystem cache to be case-insensitive for scheme and
authority. (Bill de hOra via cdouglas)
HADOOP-3506. Fix a rare NPE caused by error handling in S3. (Tom White via
cdouglas)
HADOOP-3705. Fix mapred.join parser to accept InputFormats named with
underscore and static, inner classes. (cdouglas)
HADOOP-4023. Fix javadoc warnings introduced when the HDFS javadoc was
made private. (omalley)
HADOOP-4030. Remove lzop from the default list of codecs. (Arun Murthy via
cdouglas)
HADOOP-3961. Fix task disk space requirement estimates for virtual
input jobs. Delays limiting task placement until after 10% of the maps
have finished. (Ari Rabkin via omalley)
HADOOP-2168. Fix problem with C++ record reader's progress not being
reported to framework. (acmurthy via omalley)
HADOOP-3966. Copy findbugs generated output files to PATCH_DIR while
running test-patch. (Ramya R via lohit)
HADOOP-4037. Fix the eclipse plugin for versions of kfs and log4j. (nigel
via omalley)
HADOOP-3950. Cause the Mini MR cluster to wait for task trackers to
register before continuing. (enis via omalley)
HADOOP-3910. Remove unused ClusterTestDFSNamespaceLogging and
ClusterTestDFS. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3954. Disable record skipping by default. (Sharad Agarwal via
cdouglas)
HADOOP-4050. Fix TestFairScheduler to use absolute paths for the work
directory. (Matei Zaharia via omalley)
HADOOP-4069. Keep temporary test files from TestKosmosFileSystem under
test.build.data instead of /tmp. (lohit via omalley)
HADOOP-4078. Create test files for TestKosmosFileSystem in separate
directory under test.build.data. (lohit)
HADOOP-3968. Fix getFileBlockLocations calls to use FileStatus instead
of Path reflecting the new API. (Pete Wyckoff via lohit)
HADOOP-3963. libhdfs does not exit on its own, instead it returns error
to the caller and behaves as a true library. (Pete Wyckoff via dhruba)
HADOOP-4100. Removes the cleanupTask scheduling from the Scheduler
implementations and moves it to the JobTracker.
(Amareshwari Sriramadasu via ddas)
HADOOP-4097. Make hive work well with speculative execution turned on.
(Joydeep Sen Sarma via dhruba)
HADOOP-4113. Changes to libhdfs to not exit on its own, rather return
an error code to the caller. (Pete Wyckoff via dhruba)
HADOOP-4054. Remove duplicate lease removal during edit log loading.
(hairong)
HADOOP-4071. FSNameSystem.isReplicationInProgress should add an
underReplicated block to the neededReplication queue using method
"add" not "update". (hairong)
HADOOP-4154. Fix type warnings in WritableUtils. (szetszwo via omalley)
HADOOP-4133. Log files generated by Hive should reside in the
build directory. (Prasad Chakka via dhruba)
HADOOP-4094. Hive now has hive-default.xml and hive-site.xml similar
to core hadoop. (Prasad Chakka via dhruba)
HADOOP-4112. Handles cleanupTask in JobHistory
(Amareshwari Sriramadasu via ddas)
HADOOP-3831. Very slow reading clients sometimes failed while reading.
(rangadi)
HADOOP-4155. Use JobTracker's start time while initializing JobHistory's
JobTracker Unique String. (lohit)
HADOOP-4099. Fix null pointer when using HFTP from an 0.18 server.
(dhruba via omalley)
HADOOP-3570. Includes user specified libjar files in the client side
classpath path. (Sharad Agarwal via ddas)
HADOOP-4129. Changed memory limits of TaskTracker and Tasks to be in
KiloBytes rather than bytes. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4139. Optimize Hive multi group-by.
(Namin Jain via dhruba)
HADOOP-3911. Add a check to fsck options to make sure -files is not
the first option to resolve conflicts with GenericOptionsParser
(lohit)
HADOOP-3623. Refactor LeaseManager. (szetszwo)
HADOOP-4125. Handles Reduce cleanup tip on the web ui.
(Amareshwari Sriramadasu via ddas)
HADOOP-4087. Hive Metastore API for php and python clients.
(Prasad Chakka via dhruba)
HADOOP-4197. Update DATA_TRANSFER_VERSION for HADOOP-3981. (szetszwo)
HADOOP-4138. Refactor the Hive SerDe library to better structure
the interfaces to the serializer and de-serializer.
(Zheng Shao via dhruba)
HADOOP-4195. Close compressor before returning to codec pool.
(acmurthy via omalley)
HADOOP-2403. Escapes some special characters before logging to
history files. (Amareshwari Sriramadasu via ddas)
HADOOP-4200. Fix a bug in the test-patch.sh script.
(Ramya R via nigel)
HADOOP-4084. Add explain plan capabilities to Hive Query Language.
(Ashish Thusoo via dhruba)
HADOOP-4121. Preserve cause for exception if the initialization of
HistoryViewer for JobHistory fails. (Amareshwari Sri Ramadasu via
acmurthy)
HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4077. Setting access and modification time for a file
requires write permissions on the file. (dhruba)
HADOOP-3592. Fix a couple of possible file leaks in FileUtil
(Bill de hOra via rangadi)
HADOOP-4120. Hive interactive shell records the time taken by a
query. (Raghotham Murthy via dhruba)
HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME
and then the path. (Raghotham Murthy via dhruba)
HADOOP-4242. Remove extra ";" in FSDirectory that blocks compilation
in some IDE's. (szetszwo via omalley)
HADOOP-4249. Fix eclipse path to include the hsqldb.jar. (szetszwo via
omalley)
HADOOP-4247. Move InputSampler into org.apache.hadoop.mapred.lib, so that
examples.jar doesn't depend on tools.jar. (omalley)
HADOOP-4269. Fix the deprecation of LineReader by extending the new class
into the old name and deprecating it. Also update the tests to test the
new class. (cdouglas via omalley)
HADOOP-4280. Fix conversions between seconds in C and milliseconds in
Java for access times for files. (Pete Wyckoff via rangadi)
HADOOP-4254. -setSpaceQuota command does not convert "TB" extenstion to
terabytes properly. Implementation now uses StringUtils for parsing this.
(Raghu Angadi)
HADOOP-4259. Findbugs should run over tools.jar also. (cdouglas via
omalley)
HADOOP-4275. Move public method isJobValidName from JobID to a private
method in JobTracker. (omalley)
HADOOP-4173. fix failures in TestProcfsBasedProcessTree and
TestTaskTrackerMemoryManager tests. ProcfsBasedProcessTree and
memory management in TaskTracker are disabled on Windows.
(Vinod K V via rangadi)
HADOOP-4189. Fixes the history blocksize & intertracker protocol version
issues introduced as part of HADOOP-3245. (Amar Kamat via ddas)
HADOOP-4190. Fixes the backward compatibility issue with Job History.
introduced by HADOOP-3245 and HADOOP-2403. (Amar Kamat via ddas)
HADOOP-4237. Fixes the TestStreamingBadRecords.testNarrowDown testcase.
(Sharad Agarwal via ddas)
HADOOP-4274. Capacity scheduler accidently modifies the underlying
data structures when browing the job lists. (Hemanth Yamijala via omalley)
HADOOP-4309. Fix eclipse-plugin compilation. (cdouglas)
HADOOP-4232. Fix race condition in JVM reuse when multiple slots become
free. (ddas via acmurthy)
HADOOP-4302. Fix a race condition in TestReduceFetch that can yield false
negatvies. (cdouglas)
HADOOP-3942. Update distcp documentation to include features introduced in
HADOOP-3873, HADOOP-3939. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4319. fuse-dfs dfs_read function returns as many bytes as it is
told to read unlesss end-of-file is reached. (Pete Wyckoff via dhruba)
HADOOP-4246. Ensure we have the correct lower bound on the number of
retries for fetching map-outputs; also fixed the case where the reducer
automatically kills on too many unique map-outputs could not be fetched
for small jobs. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-4163. Report FSErrors from map output fetch threads instead of
merely logging them. (Sharad Agarwal via cdouglas)
HADOOP-4261. Adds a setup task for jobs. This is required so that we
don't setup jobs that haven't been inited yet (since init could lead
to job failure). Only after the init has successfully happened do we
launch the setupJob task. (Amareshwari Sriramadasu via ddas)
HADOOP-4256. Removes Completed and Failed Job tables from
jobqueue_details.jsp. (Sreekanth Ramakrishnan via ddas)
HADOOP-4267. Occasional exceptions during shutting down HSQLDB is logged
but not rethrown. (enis)
HADOOP-4018. The number of tasks for a single job cannot exceed a
pre-configured maximum value. (dhruba)
HADOOP-4288. Fixes a NPE problem in CapacityScheduler.
(Amar Kamat via ddas)
HADOOP-4014. Create hard links with 'fsutil hardlink' on Windows. (shv)
HADOOP-4393. Merged org.apache.hadoop.fs.permission.AccessControlException
and org.apache.hadoop.security.AccessControlIOException into a single
class hadoop.security.AccessControlException. (omalley via acmurthy)
HADOOP-4287. Fixes an issue to do with maintaining counts of running/pending
maps/reduces. (Sreekanth Ramakrishnan via ddas)
HADOOP-4361. Makes sure that jobs killed from command line are killed
fast (i.e., there is a slot to run the cleanup task soon).
(Amareshwari Sriramadasu via ddas)
HADOOP-4400. Add "hdfs://" to fs.default.name on quickstart.html.
(Jeff Hammerbacher via omalley)
HADOOP-4378. Fix TestJobQueueInformation to use SleepJob rather than
WordCount via TestMiniMRWithDFS. (Sreekanth Ramakrishnan via acmurthy)
HADOOP-4376. Fix formatting in hadoop-default.xml for
hadoop.http.filter.initializers. (Enis Soztutar via acmurthy)
HADOOP-4410. Adds an extra arg to the API FileUtil.makeShellPath to
determine whether to canonicalize file paths or not.
(Amareshwari Sriramadasu via ddas)
HADOOP-4236. Ensure un-initialized jobs are killed correctly on
user-demand. (Sharad Agarwal via acmurthy)
HADOOP-4373. Fix calculation of Guaranteed Capacity for the
capacity-scheduler. (Hemanth Yamijala via acmurthy)