Add Harmbench integration #45

adarshan-intel · 2025-03-07T15:08:31Z

Overview:
We couldn't run Harmbench due to environmental issues, and the README commands from their repo not working here.
From what I could undertand, the --completions_path is basically a file which consists of the test_case_id and generated_response from the model.
As per @mariusarvinte suggestion, we have only implemented this for advbench_dataset. This is currently disabled for any other types of data input.

Utils.py
Added a JSON helper function to keep updating the json file
Attack.py

Transformed the continuation prompt which has tags + user + assistant responses => Response only from assistant using the LLmart.Transformer class.
Enabled JSON logging only for data=advbench
Indexing of JSON id based on subset id as suggested by @mariusarvinte

Input command:
python -m llmart model=llama3-8b-instruct data=advbench_behavior data.subset=[0,1,2,3] loss=model steps=1
Execution Output:

Note: The output of JSON is structured exactly in the format of Harmbench evaluation script

Fixes #11

Signed-off-by: Ajith Raj <[email protected]>

…ng in json output Signed-off-by: Ajith Raj <[email protected]>

Signed-off-by: Ajith Raj <[email protected]>

ajithraj-intel added 2 commits March 7, 2025 20:02

Added logging for output responses

c1c7cf1

Signed-off-by: Ajith Raj <[email protected]>

enabled json_output only for run_attack

e083069

Signed-off-by: Ajith Raj <[email protected]>

adarshan-intel added the hackathon label Mar 7, 2025

adarshan-intel requested a review from dxoigmn March 7, 2025 15:08

adarshan-intel assigned ajithraj-intel Mar 7, 2025

ajithraj-intel added 2 commits March 8, 2025 06:21

Added tokenizer based decoded response, and added subset based indexi…

35d5da0

…ng in json output Signed-off-by: Ajith Raj <[email protected]>

Fixed indentation issues

c7c3f3f

Signed-off-by: Ajith Raj <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Harmbench integration #45

Add Harmbench integration #45

adarshan-intel commented Mar 7, 2025 •

edited by ajithraj-intel

Loading

Add Harmbench integration #45

Are you sure you want to change the base?

Add Harmbench integration #45

Conversation

adarshan-intel commented Mar 7, 2025 • edited by ajithraj-intel Loading

adarshan-intel commented Mar 7, 2025 •

edited by ajithraj-intel

Loading