Skip to content

Commit d3e76b5

Browse files
committed
Ablation Study
1 parent 828c79d commit d3e76b5

File tree

3 files changed

+119
-25
lines changed

3 files changed

+119
-25
lines changed

index.html

+103-23
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,8 @@ <h2>Description</h2>
7878
We use the CLAP loss as an example, confirming that end-to-end fine-tuning further boosts the generation quality.
7979
</p>
8080
<p>
81-
<b>Please join us at <a href="https://interspeech2024.org" target="_blank">INTERSPEECH 2024</a> at Kos Island, Greece!</b>
81+
<b>Please check out <a href="poster.pdf" target="_blank">our poster</a> at
82+
<a href="https://interspeech2024.org" target="_blank">INTERSPEECH 2024</a> at Kos Island, Greece!</b>
8283
</p>
8384
</section>
8485

@@ -113,40 +114,36 @@ <h2>Main Experiment Results</h2>
113114
</thead>
114115
<tbody>
115116
<tr class="result-row-2" style="color: #898989">
116-
<td class="result-data-small"><span style="font-weight: 400;">AudioLDM-L (Baseline)</span></td>
117-
<td class="result-data-2">400</td> <td class="result-data-2">-</td> <td class="result-data">-</td>
117+
<td class="result-data-small">AudioLDM-L (Baseline)</td> <td class="result-data-2">400</td>
118+
<td class="result-data-2">-</td> <td class="result-data">-</td>
118119
<td class="result-data">-</td> <td class="result-data-2">-</td> <td class="result-data-2">-</td>
119-
<td class="result-data-2"><span style="font-weight: 400;">2.08</span></td> <td class="result-data-2">27.12</td>
120-
<td class="result-data-2">1.86</td>
120+
<td class="result-data-2-400">2.08</td> <td class="result-data-2">27.12</td> <td class="result-data-2">1.86</td>
121121
</tr>
122122
<tr class="result-row-2" style="color: #898989">
123-
<td class="result-data-small"><span style="font-weight: 400;">TANGO (Baseline)</span></td>
123+
<td class="result-data-small">TANGO (Baseline)</td>
124124
<td class="result-data-2">400</td> <td class="result-data-2">168</td>
125125
<td class="result-data"><b>4.136</b></td> <td class="result-data"><b>4.064</b></td>
126-
<td class="result-data-2"><span style="font-weight: 400;">24.10</span></td> <td class="result-data-2"><b>72.85</b></td>
127-
<td class="result-data-2"><b>1.631</b></td> <td class="result-data-2"><b>20.11</b></td>
128-
<td class="result-data-2">1.362</td>
126+
<td class="result-data-2-400">24.10</td> <td class="result-data-2"><b>72.85</b></td>
127+
<td class="result-data-2"><b>1.631</b></td> <td class="result-data-2"><b>20.11</b></td> <td class="result-data-2">1.362</td>
129128
</tr>
130129
<tr class="result-row">
131-
<td class="result-data-small"><span style="font-weight: 400;">ConsistencyTTA + CLAP-FT</span></td>
130+
<td class="result-data-small">ConsistencyTTA + CLAP-FT</td>
132131
<td class="result-data-2"><b>1</b></td> <td class="result-data-2"><b>2.3</b></td>
133132
<td class="result-data">3.830</td> <td class="result-data"><b>4.064</b></td>
134-
<td class="result-data-2"><b>24.69</b></td> <td class="result-data-2"><span style="font-weight: 400;">72.54</span></td>
135-
<td class="result-data-2">2.406</td> <td class="result-data-2"><span style="font-weight: 400;">20.97</span></td>
136-
<td class="result-data-2"><span style="font-weight: 400;">1.358</span></td>
133+
<td class="result-data-2"><b>24.69</b></td> <td class="result-data-2-400">72.54</td>
134+
<td class="result-data-2">2.406</td> <td class="result-data-2-400">20.97</td> <td class="result-data-2-400">1.358</td>
137135
</tr>
138136
<tr class="result-row">
139-
<td class="result-data-small"><span style="font-weight: 400;">ConsistencyTTA</span></td>
137+
<td class="result-data-small">ConsistencyTTA</td>
140138
<td class="result-data-2"><b>1</b></td> <td class="result-data-2"><b>2.3</b></td>
141-
<td class="result-data"><span style="font-weight: 400;">3.902</span></td> <td class="result-data">4.010</td>
139+
<td class="result-data-400">3.902</td> <td class="result-data">4.010</td>
142140
<td class="result-data-2">22.50</td> <td class="result-data-2">72.30</td>
143141
<td class="result-data-2">2.575</td> <td class="result-data-2">22.08</td>
144142
<td class="result-data-2"><b>1.354</b></td>
145143
</tr>
146144
<tr class="result-row-2-small" style="color: #898989">
147-
<td class="result-data"><span style="font-weight: 400;">Ground Truth</span></td>
148-
<td class="result-data-2">-</td> <td class="result-data-2">-</td>
149-
<td class="result-data">-</td> <td class="result-data">-</td>
145+
<td class="result-data-small">Ground Truth</td> <td class="result-data-2">-</td>
146+
<td class="result-data-2">-</td> <td class="result-data">-</td> <td class="result-data">-</td>
150147
<td class="result-data-2">26.71</td> <td class="result-data-2">100</td>
151148
<td class="result-data-2">-</td> <td class="result-data-2">-</td> <td class="result-data-2">-</td>
152149
</tr>
@@ -155,7 +152,90 @@ <h2>Main Experiment Results</h2>
155152
<p>
156153
<a href="https://paperswithcode.com/sota/audio-generation-on-audiocaps" target=&ldquo;blank&rdquo;>This benchmark</a>
157154
demonstrates how our single-step models stack up with previous methods,
158-
most of which mostly require hundreds of generation steps.
155+
most of which requiring hundreds of generation steps.
156+
</p>
157+
</section>
158+
159+
<section class="section">
160+
<h2>Ablation Studies on Distillation Settings</h2>
161+
<p>
162+
<table class="result-table">
163+
<thead>
164+
<tr class="result-row">
165+
<th class="result-head">Guidance Method</th>
166+
<th class="result-head">CFG Weight</th>
167+
<th class="result-head">Teacher Solver</th>
168+
<th class="result-head">Noise Schedule</th>
169+
<th class="result-head-2">FAD ↓</th>
170+
<th class="result-head-2">FD ↓</th>
171+
<th class="result-head-2">KLD ↓</th>
172+
</tr>
173+
</thead>
174+
<tbody>
175+
<tr class="result-row-2">
176+
<td class="result-data-small">Unguided</td>
177+
<td class="result-data-small">1</td>
178+
<td class="result-data-small">DDIM</td>
179+
<td class="result-data-small">Uniform</td>
180+
<td class="result-data-2">13.48</td>
181+
<td class="result-data-2">45.75</td>
182+
<td class="result-data-2">2.409</td>
183+
</tr>
184+
<tr class="result-row-2">
185+
<td class="result-data-small" rowspan="2">External CFG</td>
186+
<td class="result-data-small" rowspan="2">3</td>
187+
<td class="result-data-small">DDIM</td>
188+
<td class="result-data-small">Uniform</td>
189+
<td class="result-data-2">8.565</td>
190+
<td class="result-data-2">38.67</td>
191+
<td class="result-data-2">2.015</td>
192+
</tr>
193+
<tr class="result-row-2">
194+
<td class="result-data-small">Heun</td>
195+
<td class="result-data-small">Karras</td>
196+
<td class="result-data-2">7.421</td>
197+
<td class="result-data-2">39.36</td>
198+
<td class="result-data-2">1.976</td>
199+
</tr>
200+
<tr class="result-row-2">
201+
<td class="result-data-small" rowspan="2">CFG Distillation<br>with Fixed Weight</td>
202+
<td class="result-data-small" rowspan="2">3</td>
203+
<td class="result-data-small" rowspan="2">Heun</td>
204+
<td class="result-data-small">Karras</td>
205+
<td class="result-data-2">5.702</td>
206+
<td class="result-data-2">33.18</td>
207+
<td class="result-data-2">1.494</td>
208+
</tr>
209+
<tr class="result-row-2">
210+
<td class="result-data-small">Uniform</td>
211+
<td class="result-data-2">3.859</td>
212+
<td class="result-data-2"><b>27.79</b></td>
213+
<td class="result-data-2">1.421</td>
214+
</tr>
215+
<tr class="result-row-2">
216+
<td class="result-data-small" rowspan="3">CFG Distillation<br>with Random Weight</td>
217+
<td class="result-data-small">4</td>
218+
<td class="result-data-small" rowspan="2">Heun</td>
219+
<td class="result-data-small" rowspan="2">Uniform</td>
220+
<td class="result-data-2-400">3.180</td>
221+
<td class="result-data-2-400">27.92</td>
222+
<td class="result-data-2-400">1.394</td>
223+
</tr>
224+
<tr class="result-row-2">
225+
<td class="result-data-small">6</td>
226+
<td class="result-data-2"><b>2.975</b></td>
227+
<td class="result-data-2">28.63</td>
228+
<td class="result-data-2"><b>1.378</b></td>
229+
</tr>
230+
</tbody>
231+
</table>
232+
Based on these results, we can conclude that:
233+
<ul>
234+
<li>CFG distillation with random weight is more effective than fixed weight,
235+
which is more effective than external CFG.</li>
236+
<li>Heun is a better teacher solver than DDIM, and
237+
Uniform noise schedule outperforms Karras noise schedule.</li>
238+
</ul>
159239
</p>
160240
</section>
161241

@@ -183,11 +263,11 @@ <h2>Human Evaluation</h2>
183263
<h2>Citing Our Work (BibTeX)</h2>
184264
<div id="bibtex1" class="bibtex" onclick="copyToClipboard('bibtex1')">
185265
<i class="far fa-copy copy-icon"></i>
186-
<pre>@article{bai2023accelerating,
266+
<pre>@inproceedings{bai2024accelerating,
187267
author = {Bai, Yatong and Dang, Trung and Tran, Dung and Koishida, Kazuhito and Sojoudi, Somayeh},
188-
title = {Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation},
189-
journal={arXiv preprint arXiv:2309.10740},
190-
year = {2023}
268+
title = {ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation},
269+
booktitle = {INTERSPEECH},
270+
year = {2024}
191271
}</pre>
192272
</div>
193273
</section>

poster.pdf

292 KB
Binary file not shown.

styles.css

+16-2
Original file line numberDiff line numberDiff line change
@@ -292,17 +292,31 @@ tr td:last-child {
292292
padding: 7px 12px;
293293
border-bottom: 1px solid #e7ebef;
294294
background-color: #dfe3f241;
295-
font-size: 1.2em;
295+
font-size: 1.15em;
296+
}
297+
.result-data-400 {
298+
padding: 7px 12px;
299+
border-bottom: 1px solid #e7ebef;
300+
background-color: #dfe3f241;
301+
font-size: 1.15em;
302+
font-weight: 400;
296303
}
297304
.result-data-2 {
298305
padding: 7px 12px;
299306
border-bottom: 1px solid #e7ebef;
300-
font-size: 1.2em;
307+
font-size: 1.15em;
308+
}
309+
.result-data-2-400 {
310+
padding: 7px 12px;
311+
border-bottom: 1px solid #e7ebef;
312+
font-size: 1.15em;
313+
font-weight: 400;
301314
}
302315
.result-data-small {
303316
padding: 7px 12px;
304317
border-bottom: 1px solid #e7ebef;
305318
background-color: #dfe3f241;
319+
font-weight: 400;
306320
}
307321

308322
/* Optional: Add transitions for smoother hover effects */

0 commit comments

Comments
 (0)