-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdbgheap.txt
351 lines (275 loc) · 18.3 KB
/
dbgheap.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
Catching memory overwrites
dbgheap1.cpp
Finding bugs with a debugger
Finding bugs without the debugger
mapfilt.exe
Making a better bug-trap
heap errors
double-deletes
delete uninitialized pointers
accessing deleted memory
memory leaks
malloc+free+realloc/new+delete/new[]+delete[]
multiple heaps
tracing callers
tracing
Statistic
Conclusion
[figure:membug.gif]
1:Hunting memory bugs
Have you ever hunted a memory overwrite bug? If you have you know how hard they can be to track down. Every programmers nightmare is when such a bug suddenly creps up in a large project.
The runtime libraries included with most compilers include a debug-version of the heap management functions but they usually first detect the overwrite after it has happened. Would it not be nice if you could detect the overwrite when it happens?
In the following sections I will implement a complete debugging/tracing library which will significantly reduce the time you spend tracing heap management bugs. I will not tell you how to avoid heap errors only how to detect and find them.
Note: I primarily use Watcom C/C++, but the tools have also been tested with VAC++ 3.0. The sources may contain information on what you have to do and/or change if you use VAC++. All sources should be easy to adapt to other compilers (BC/2, EMX, Metaware...)
2:Catching memory overwrites
OS/2 implements protection of the memory. An example is that writing to an area containing the code for the program will trap. eg:
code:
void foo() {
...
}
...
char *p=(char*)foo;
strcpy(p,"hello world");
:ecode
This code will trap due to OS/2's memory protection. The area where the code is is marked as read-only/execute-only.
OS/2 has an API to allocate and deallocate memory (DosAllocMem()/DosFreeMem()), but also an API to change the protection of a memory block: DosSetMem(). The protection (or "attributes") can be set individually for each page. A page is 4K.
The most common form of memory overwrite is to allocate too few bytes, usually due to an off-by-one error, eg:
code:
char *p=(char)malloc(11);
strcpy(p,"hello world");
:ecode
This code allocates 10 bytes but puts a 12-byte string (remember the terminating NUL) into it. The program may work, and worse: the program may crash several hours later in a completly unrelated function.
By replacing the standard heap management functions with our own functions we can take full control of how the memory is allocated.
I will only use three attributes of memory protection features: unknown, committed and uncomitted.
* unknown
= If we have never allocated a page in any way the virtual addresses are unknown. Accessing such a page causes a trap.
* comitted
= When a page is accessible (read/write) it is committed. Note: whether OS/2 really has reserved a physical piece of memory doesn't matter.
* uncomitted (or "reserved")
= It is also possible to only reserve range of virtual addresses without physical memory being reserved. Accessing such a page causes a trap.
Now back to catching memory overwrites. If we could allocate exactly 11 bytes an ensure that accessing memory outside the 11 bytes would cause a trap it would be a lot easier to find where the memory overwrite happens.
This can be done without too much trouble. By reserving enough pages to hold the requested number of bytes + 1 page and only committing the requested number of bytes and returning a pointer into the area so that accesses beyond the requested number of bytes cases a trap, we can catch memory overwrites in seconds instead of hunting them for days.
[figure:membug1.gif]
Note that each allocation uses at least 8K address space and at least 4K memory. Since OS/2 DosAllocMem() only allocates memory below 512MB you can only make 512MB/8K = 65536 allocations. It is even worse on some version of OS/2 (documentation says Warp3 and Warp4 server) where allocations reserves at least 64K address space (512M/64K = 8192 allocations). So if your program makes many allocations it may run out of address space. But that way you can also examine how well your program handles low memory conditions.
3:dbgheap1.cpp
The full source is in dbgheap1.cpp. dbgheap1.cpp implements replacements for new/delete/new[]/delete[]/malloc/free/realloc/calloc that uses the mechanism described in the previous section.
Here are a few highlights:
code:
struct chunk_header {
unsigned chunk_size; //# of comitted bytes
unsigned block_size; //heap block size (requested size)
unsigned block_offset; //offset from start of block to user-)pointer
};
:ecode
Each chunk of memory starts with a chunk_header. The chunk header contains information that we need when the chunk is to be DosFreeMem()'ed. It also contains redundant information so we can check for overwrites of the chunk_header.
Nexts comes a (maybe zero) number of bytes that are unused. These are filled with FILL_BYTE (0xfe) so they can also be checked for overwrites.
Then comes the area where the 'user' memory is. It is positioned so that the end of that area is aligned with the end of a page.
Then comes the important thing: an uncommitted page. If the program tries to use more bytes that were allocated it will trap.
When the memory is to be freed, the freemem() first converts the "user"-pointer (which points somewhere into a page) to a page-aligned pointer.
Then overwrite of the chunk_header is checked:
code:
if(chp->block_offset != chp->chunk_size-chp->block_size) {
exit(0);
}
:ecode
Redundancy helps debugging.
Then the "user"-pointer is checked. We only accept the exact same pointer that were returned from allocmem():
code:
if(p != pchunk+chp->block_offset) {
exit(0);
}
:ecode
Then the unused area that (hopefully) contains FILL_BYTE is checked:
code:
for(unsigned char *checkp=((unsigned char*)pchunk)+sizeof(chunk_header);
checkp<p;
checkp++)
{
if(*checkp != FILL_BYTE)
exit(0);
}
:ecode
Then the standard heap management functions are replaced: new/delete, new[]/delete[] and malloc/free/realloc/calloc. The replacements just calls allocmem() / freemem(). (Except realloc() which is a bit more complicated)
The end of dbgheap1.cpp contains a test stub:
:code
int main() {
char *p = new char[12];
strcpy(p,"Hello world"); //ok
strcpy(p,"Hello my friend. What's your name?"); //error
delete[] p;
return 0;
}
:ecode
12 bytes are allocated. Then a 12-byte string is copied to the allocated memory. This should work. Then a longer string is copied to the memory. This will cause the program to trap when strcpy() crosses the page boundary and tries to write to the uncommitted page.
Gotcha!
3:Finding bugs with a debugger
Try compiling dbgheap1.cpp and run the program from within you debugger.
When I do this using WDW (Watcoms debugger) it says: "A task exception has occurred: general protection fault" and then it leaves me in an assembler view of the naughty instruction in strcpy(). Knowing that strcpy() does not contain the bug I then select to trace the call stack "Code|Calls". hmmm-hmmm-hmmm. It seems that strcpy() was called from line 143 in main() in dbgheap1.cpp. What a surprise :-)
3:Finding bugs without the debugger
What if you did not run the program from within you debugger?
OS/2 roughly says: "A program has generated bla bla bla...". Select "show help" then it says:
code:
A program has generated a general protection fault at 0001041a
DBGHEAP1.EXE 0001:0000041a"
:ecode
This is usually not very helpful. But not any more. The program has accessed the virtual address 0001041a. The code that did that was in DBGHEAP1.EXE, object 0001, offset 0000041a. Where is that?
When called within Watcoms IDE Watcom's linker by default generates a .MAP file (wcl386 needs /fm). (VAC++ needs ICC /Fm or ILINK /MAP). This map file is unsorted but is contains most of the information you need to pinpoint the error.
I have included a small program that extracts the interesting lines in Watcoms .MAP file and sorts them by object+address (code for VAC++s' MAP files are #ifdef'ed away) Try running:
code:
mapfilt dbgheap1.map map.tmp
:ecode
Now search for 0001:0000041a:
code:
0001:000003a0 exit_
0001:000003cc+ _exit_
0001:000003f1 memcpy_
0001:00000416 strcpy_
0001:00000436 _cstart_
:ecode
The error obviously occurred in the function strcpy_. So now you know that the problem is a call to strcpy().
I will use the MAPFILT program later.
2:Making a better bug-trap
[figure:membug2.gif]
The program dbgheap1.cpp catches most memory overwrites. But what about other bugs?
3:Classes of heap errors
There exists several bugs concerning heap management:
[To editor/converters: make the following a definition-list]
Double-free:
Sometimes the bug is that the program tries to free a memory block that has already beed freed. The error can be the first free og the last free. If the error is the first free we need some sort of history to find out who did that.
Freeing uninitialized pointers:
Sometimes a program tries to free memory using a pointer that has not been assigned a value.
Accessing freed memory:
It is an error to access a memory block after it has freed. This can be difficult to detect since the memory block may have been used again.
Memory leaks:
Not freeing memory but just "forgetting it" is sometimes acceptable in very small programs (mapfilt.cpp is an example) but in programs that run for long periods of time this can be a major trouble. Imagine if a server program leaked 10 byte for every file accesss. How long would it take before SWAPPER.DAT would have grown to 200MB and the server had to be shut down? Not very long.
Mix of heap management functions:
Memory allocated with heap management (malloc/free/realloc) cannot be deallocated with C++ heap management (new/delete) and vice versa. Moreover C++ single-object heap management (new/delete) cannot be mixed with C++ array heap managment (new[]/delete[]).
Multiple heaps:
If you program consists of an .EXE and a .DLL, memory allocated in the DLL may be in a different heap that memory allocated in the .EXE. Some compilers support freeing what has been allocated in another module, but it is very bad pratice, since this could change in the next version of compiler or by the way the modules were linked.
By extending the capabilities of the debug-heap we can catch all these hard-to-find bugs.
3:Tracing callers
As you may have noted in an earlier section trapping inside allocmem() or freemem() is not very informative. It would be a big improvement if we knew who called malloc/free/new/delete/...
This information is on the stack but it is extremely difficult to locate. It not only depends on the compiler but also wether or not debugging information is generated, which optimization level you are using and the version of the compiler.
This can only be done 100% error-free in assembler.
This opens up another (but smaller) can of worms. Then we have to deal with name decoration, calling conventions and mangled names. Furtunatly both Watcom and VisualAge contains information on most of these questions. callpeek.asm (callpeek.vac for VAC++) contains very simple implementations of new/delete/malloc/... that simply grabs SS:[ESP] and jumps/calls to the real procedure dbg_new/dbg_delete/dbg_malloc/... in dbgheap2.cpp
3:Tracing
In order to post-mortem find out why the debug-heap-management found an error (and to create nifty things later on) the heap operations must be logged. I decided to log to a named pipe \pipe\tracemon since gives flexibility later on. tracemon.cpp implements a named-pipe server that simply writes whatever it gets through the named pipe to a file and standard output.
dbgheap2.cpp logs several things:
[note to editor or converters: make the following a definition-list]
TID
Multi-threaded heap errors can be a lot tougher to track down than single-threaded ones. Including the TID allows you to deect which thread allocated the memory and which thread freed the memory.
Heap
In order to detect multiple-heaps errors the heap has to be logged. The heap identification is simply the address of a static variable in dbgheap2. Since a .EXE and a .DLL which both have dbgheap2.cpp linked in statically will have separate static variables this is unique.
Operation
This can be malloc()/free()/new/delete/new[]/delete[].
Caller
This would also be nice.
Address range
This does not include the extra uncomitted page.
User pointer
The pointer that user-code sees/uses.
Size
The size of the allocation.
3:A few notes on dbgheap2.cpp
dbgheap2.cpp is derived from dbgheap1.cpp. These are the non-trivial changes:
- The housekeeping structure chunk_header has been extended with two members: mgrclass and heap to track management class and heap.
- A set of logging functions: log_operation(), log_range(), log_norange(), log_userptr(), log_operationfailure() and log_operationsuccess().
- heap_panic (located in callpeek.asm) is called when a heap error is detected. heap_panic() issues an interrupt 3 which gets the attention of the debugger if the program has been run from within one, or causes OS/2 to display an error message and terminate the program.
- freemem() does not deallocate the memory any longer is just decommits it to ensure that the memory is not reused and guarantees that "access to freed memory"-errors cause a trap.
3:Testing
Create a program from callpeek.asm, dbgheap2.cpp and bugs.cpp. For instance:
code:
Watcom:
wcl386 /s /fm bugs.cpp callpeek.asm dbgheap2.cpp
VAC++:
alp callpeek.vac
icc /Fm /B"NOE" bugs.cpp callpeek.obj dbgheap2.cpp
:ecode
Watcom needs to have stack checks turned off (/s), since malloc() is called early in the initialization when the stack has not been completely initialized. VAC++ linker (/B"/NOE") needs the /NOEXTDICTIONARY otherwise it complains about duplicated symbols (malloc()/free()/...)
Note: you can change the #ifdefs in bugs.cpp to test each bug type.
Start tracemon:
code:
start /f tracemon
:ecode
Launch your debugger.
Start bugs.EXE
Just press 'GO'
TRAP!
Which kind of trap occurs and where the debugger leave you depends on the bug.
"double-free", "free uninitialized pointers", "Mix of heap management" and "multiple heaps" is detected by dbgheap2.cpp and leaves you inside heap_panic(). Switch to tracemon.EXE (which you remembered to start, didn't you?) and read the last message. Then let the debugger display the call stack.
"accessing deleted memory" and "memory overwrite" (past the end of the block) causes an "access violation" trap where the bugs is.
"memory overwrite" (before the block) is detected when freemem() is called.
3:Memory leaks
Detecting memory leaks can only be done after the program has stopped. The algorithm is simple: just find allocations that were not freed.
leakfind.cpp does exactly this.
Try changing bugs.cpp to use the "memory leak" portion. Compile and link bugs.EXE, restart tracemon and run bugs.EXE.
After bugs.EXE has finished running the log should contain something like this (expect the header):
code:
TID Heap op. Caller Range Usrptr Bytes
00000001 00020024 malloc() 00010c67 00080000-00080fff 00080ffc 00000004
00000001 00020024 malloc() 00010ca2 00090000-00090fff 00090fba 00000046
00000001 00020024 malloc() 0001139a 000a0000-000a0fff 000a0f60 000000a0
00000001 00020024 new[] 00010032 000b0000-000b0fff 000b0ff6 0000000a
00000001 00020024 new[] 0001003f 000c0000-000c0fff 000c0ff6 0000000a
00000001 00020024 delete[] 00010050 000b0000-000b0fff 000b0ff6 0000000a
:ecode
The leaks are easy to spot in this small log but for large logs they are not.
Run leakfind:
code:
leakfind tracemon.log
:ecode
and leakfind says:
code:
Leaks:
00000001 00020024 malloc() 00010c67 00080000-00080fff 00080ffc 00000004
00000001 00020024 malloc() 00010ca2 00090000-00090fff 00090fba 00000046
00000001 00020024 malloc() 0001139a 000a0000-000a0fff 000a0f60 000000a0
00000001 00020024 new[] 0001003f 000c0000-000c0fff 000c0ff6 0000000a
:ecode
The first three allocation are made by Watcoms runtime-library and there is nothing we can do about them. The last one is more interesting. The caller is 0001003f. Knowing that Watcom puts code into object 0001 and that object 0001 is loaded at 00010000 helps a lot. So the problem is at 0001:0000003f
Using mapfilt.exe bugs.map bugs.lst gives us this:
code:
0000:00001234 __DOSseg__
0001:00000003 ___begtext
0001:00000010 main_
0001:000007d0 dbg_new_
:ecode
The large gap between mail_ and dbg_new_ is consumed by static functions in dbgheap2.cpp. Watcoms linker by default does not generate map entries for static functions. But 0001:0000003f is suspiciously near main_. Let's have a look at main() in bugs.cpp:
code:
void main(void) {
char *p1=new char[10];
char *p2=new char[10];
*p2='\A';
delete[] p1;
}
:ecode
Of course! p2 is not freed! What a surprise! <G>
2:Statistics
The log generated by dbgheap2.cpp can be used not only for debugging but also performance tuning. heapstat.cpp generates a simple statistic. "heapstat tracemon.log" gives a result like this:
code:
Heap statistics:
Size<= Operations Peak
4 0 0
8 0 0
16 2 2
256 1 1
1024 1 1
4096 3 3
32768 0 0
65536 0 0
4294967295 0 0
Size Count
10 2
39 1
388 1
4096 3
:ecode
heapstat.cpp tells you for some preselected values how many allocations there were of that size and maximum number of allocation that were at any time. It also shows size and count for individual sizes
This information can be used for performance tuning. Example 1: if there are extremely many allocations of a particular (small) size, it may be worthwhile to program a special-purpose allocator for that size. Example 2: If almost all allocations are larger than 16384 bytes it might be faster to code your own allocator that uses DosAllocMem() directly.
2:Conclusion
[figure:membug3.gif]
Good knowledge of what OS/2 can do combined with a few simple tools can be a very powerful. In fact, the tools described in this article can do more than most (all?) commercial heap-debug tools. And it's free!
What is missing? the capability to read debug-info to better pinpoint who called malloc/new/delete/..., automating starting/stopping tracemon. ...ooh yes and a sluggish GUI interface, limiting the tool to 1 or 2 compilers, a $600 bill and a 3-inch thick manual.
This is left as an exercise to the reader <G>
That's all folk.