Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harness: Detector only #833

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d9e7f28
detectoronly harness
vidushiMaheshwari Aug 15, 2024
c8b7e77
Update attempt.py
vidushiMaheshwari Aug 15, 2024
13c89fe
change
vidushiMaheshwari Aug 15, 2024
b87068f
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 15, 2024
df107c2
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
d9d43a6
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
662fbf0
100_pass_mod
vidushiMaheshwari Aug 15, 2024
15c1097
change
vidushiMaheshwari Aug 15, 2024
3f8e263
Update attempt.py
vidushiMaheshwari Aug 15, 2024
4714757
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
9f63ab1
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
1b5aa46
100_pass_mod
vidushiMaheshwari Aug 15, 2024
239cfc8
docs
vidushiMaheshwari Aug 19, 2024
65c0c36
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 19, 2024
e9eb742
harness config options and files
vidushiMaheshwari Aug 19, 2024
a35cb5a
Probewise harness is a dictionary instead of attributed class
vidushiMaheshwari Aug 20, 2024
eab5c67
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
c3d33d4
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
153ff2e
Update garak/harnesses/base.py
vidushiMaheshwari Aug 22, 2024
5086e80
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
6526194
Merge branch 'main' of https://github.com/vidushiMaheshwari/garak int…
vidushiMaheshwari Aug 23, 2024
a7b3e1f
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 23, 2024
dbe916a
decouple harness only run from execution
vidushiMaheshwari Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/garak.harnesses.detectoronly.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
garak.harnesses.detectoronly
====================

.. automodule:: garak.harnesses.detectoronly
:members:
:undoc-members:
:show-inheritance:

1 change: 1 addition & 0 deletions docs/source/harnesses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ garak.harnesses

garak.harnesses
garak.harnesses.base
garak.harnesses.detectoronly
garak.harnesses.probewise
garak.harnesses.pxd
18 changes: 18 additions & 0 deletions garak/attempt.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,24 @@ def as_dict(self) -> dict:
"messages": self.messages,
}

@classmethod
def from_dict(cls, dicti):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- out of scope for here, but we should implement serialization/deserialization for Attempts

"""Initializes an attempt object from dictionary"""
attempt_obj = cls()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this skip the attempt constructor? Can we add an explicit type signature to signal what cls is expected to be?

Copy link
Collaborator

@jmartin-tech jmartin-tech Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls is the callable for the class which will be an Attempt. This will call the __init__() method with all defaults.

Due to the current overrides in the class attempt_obj.outputs below may not produce the same in memory object for a multi-turn conversation attempt since the existing as_dict() method serialized outputs into the log and not the full messages history.

For the purposes of this PR I suspect this is acceptable, however it is worth noting.

attempt_obj.uuid = dicti['uuid']
attempt_obj.seq = dicti['seq']
attempt_obj.status = dicti['status']
attempt_obj.probe_classname = dicti['probe_classname']
attempt_obj.probe_params = dicti['probe_params']
attempt_obj.targets = dicti['targets']
attempt_obj.prompt = dicti['prompt']
attempt_obj.outputs = dicti['outputs']
attempt_obj.detector_results = dicti['detector_results']
attempt_obj.notes = dicti['notes']
attempt_obj.goal = dicti['goal']
attempt_obj.messages = dicti['messages']
return attempt_obj

def __getattribute__(self, name: str) -> Any:
"""override prompt and outputs access to take from history"""
if name == "prompt":
Expand Down
61 changes: 55 additions & 6 deletions garak/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,21 @@ def main(arguments=None) -> None:
action="store_true",
help="If detectors aren't specified on the command line, should we run all detectors? (default is just the primary detector, if given, else everything)",
)

# harness
parser.add_argument(
"--harness_options",
type=str,
help="Type of harness to use."
)

parser.add_argument(
"--harness_option_file",
"-H",
type=str,
help="path to JSON file containing information harness options"
)

# buffs
parser.add_argument(
"--buffs",
Expand Down Expand Up @@ -337,7 +352,7 @@ def main(arguments=None) -> None:
import garak.evaluators

try:
plugin_types = ["probe", "generator"]
plugin_types = ["probe", "generator", "harness"]
# do a special thing for CLI probe options, generator options
for plugin_type in plugin_types:
opts_arg = f"{plugin_type}_options"
Expand Down Expand Up @@ -368,7 +383,10 @@ def main(arguments=None) -> None:
)
raise e

config_plugin_type = getattr(_config.plugins, f"{plugin_type}s")
if plugin_type.endswith('s'):
config_plugin_type = getattr(_config.plugins, f"{plugin_type}es")
else:
config_plugin_type = getattr(_config.plugins, f"{plugin_type}s")

config_plugin_type = _config._combine_into(
opts_cli_config, config_plugin_type
Expand Down Expand Up @@ -499,19 +517,50 @@ def main(arguments=None) -> None:
print(f"📜 reporting to {_config.transient.report_filename}")

if parsed_specs["detector"] == []:
command.probewise_run(
generator, parsed_specs["probe"], evaluator, parsed_specs["buff"]
)
command.probewise_run(generator, parsed_specs["probe"], evaluator, parsed_specs["buff"])
else:
command.pxd_run(
generator,
parsed_specs["probe"],
parsed_specs["detector"],
evaluator,
parsed_specs["buff"],
parsed_specs["buff"]
)

if "detectoronly" not in _config.plugins.harnesses:
command.end_run()

elif "detectoronly" in _config.plugins.harnesses:

if "report_path" not in _config.plugins.harnesses["DetectorOnly"]:
logging.error("report path not specified")
raise ValueError("Specify jsonl report path using report_path")

if not parsed_specs["detector"]: # If the user doesn't specify any detectors, repeat the same as the reoport's
# read from the report
logging.info("Using detectors from the report file")
f = open(_config.plugins.harnesses["DetectorOnly"]["report_path"], "r")
entry_line = None
while True:
line = f.readline()
line = json.loads(line)
if 'entry_type' in line and line['entry_type'] == 'start_run setup':
entry_line = line
break
if not line:
break # EOF

if not entry_line:
raise ValueError("Report file missing setup entry")

parsed_specs["detector"] = entry_line['plugins.detector_spec'].split(',')

if not _config.plugins.model_type: # Have not initialized the file yet
command.start_run()

command.detector_only_run(parsed_specs["detector"], evaluator)
command.end_run()

else:
print("nothing to do 🤷 try --help")
if _config.plugins.model_name and not _config.plugins.model_type:
Expand Down
20 changes: 20 additions & 0 deletions garak/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ def start_run():
list,
set,
type(None),
float, # Without float eval_threshold was not being stored
):
setup_dict[f"{subset}.{k}"] = v

Expand Down Expand Up @@ -255,3 +256,22 @@ def write_report_digest(report_filename, digest_filename):
digest = report_digest.compile_digest(report_filename)
with open(digest_filename, "w", encoding="utf-8") as f:
f.write(digest)

def detector_only_run(detectors, evaluator):
import garak.harnesses.detectoronly
import garak.attempt
from garak import _config

config = _config.plugins.harnesses["DetectorOnly"]

with open(config["report_path"]) as f:
data = [json.loads(line) for line in f]

data = [d for d in data if d["entry_type"] == "attempt" and d["status"] == 1]
attempts = [garak.attempt.Attempt.from_dict(d) for d in data]

if len(attempts) == 0:
raise ValueError("No attempts found in report.jsonl")

detector_only_h = garak.harnesses.detectoronly.DetectorOnly()
detector_only_h.run(attempts, detectors, evaluator)
47 changes: 24 additions & 23 deletions garak/harnesses/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,28 +109,29 @@ def run(self, model, probes, detectors, evaluator, announce_probe=True) -> None:
assert isinstance(
attempt_results, (list, types.GeneratorType)
), "probing should always return an ordered iterable"

for d in detectors:
logging.debug("harness: run detector %s", d.detectorname)
attempt_iterator = tqdm.tqdm(attempt_results, leave=False)
detector_probe_name = d.detectorname.replace("garak.detectors.", "")
attempt_iterator.set_description("detectors." + detector_probe_name)
for attempt in attempt_iterator:
attempt.detector_results[detector_probe_name] = list(
d.detect(attempt)
)

for attempt in attempt_results:
attempt.status = garak.attempt.ATTEMPT_COMPLETE
_config.transient.reportfile.write(json.dumps(attempt.as_dict()) + "\n")

if len(attempt_results) == 0:
logging.warning(
"zero attempt results: probe %s, detector %s",
probe.probename,
detector_probe_name,
)
else:
evaluator.evaluate(attempt_results)
self.run_detectors(detectors, attempt_results, evaluator, probe)

logging.debug("harness: probe list iteration completed")

def run_detectors(self, detectors, attempt_results, evaluator, probe=None):
for d in detectors:
logging.debug("harness: run detector %s", d.detectorname)
attempt_iterator = tqdm.tqdm(attempt_results, leave=False)
detector_probe_name = d.detectorname.replace("garak.detectors.", "")
attempt_iterator.set_description("detectors." + detector_probe_name)
for attempt in attempt_iterator:
attempt.detector_results[detector_probe_name] = list(
d.detect(attempt)
)

for attempt in attempt_results:
attempt.status = garak.attempt.ATTEMPT_COMPLETE
_config.transient.reportfile.write(json.dumps(attempt.as_dict()) + "\n")

if len(attempt_results) == 0:
msg = "zero attempt results: attempt %s" % attempt.uuid()
if probe is not None:
msg += ", probe %s" % probe.probename
logging.warning(msg)
else:
evaluator.evaluate(attempt_results)
44 changes: 44 additions & 0 deletions garak/harnesses/detectoronly.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# SPDX-FileCopyrightText: Portions Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Detector only harness

Runs specified detectors on already existing prompt-response pairs from parsing a report.jsonl file.
"""

import logging

from garak import _config, _plugins
from garak.harnesses import Harness
from garak.detectors import Detector

class DetectorOnly(Harness):
def __init__(self, config_root=_config):
super().__init__(config_root)

def _load_detector(self, detector_name: str) -> Detector:
detector = _plugins.load_plugin(
"detectors." + detector_name, break_on_fail=False
)
if detector:
return detector
else:
print(f" detector load failed: {detector_name}, skipping >>")
logging.error(f" detector load failed: {detector_name}, skipping >>")
return False

def run(self, attempts, detector_names, evaluator):
detectors = []
for detector in sorted(detector_names):
d = self._load_detector(detector)
if d:
detectors.append(d)

if len(detectors) == 0:
msg = "No detectors, nothing to do"
logging.warning(msg)
if hasattr(_config.system, "verbose") and _config.system.verbose >= 2:
print(msg)
raise ValueError(msg)

super().run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this work?

Suggested change
super().run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.
self.run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.

1 change: 1 addition & 0 deletions garak/resources/garak.core.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ run:
eval_threshold: 0.5
generations: 10
probe_tags:
probed_report_path:

plugins:
model_type:
Expand Down
Loading