Skip to content

Commit

Permalink
15 llm content check (#26)
Browse files Browse the repository at this point in the history
* docs: explain custom keywords

* feat: add llm content check and warning of inconsistencies

* docs: llm content warning

* tests: adjust for llm content check
  • Loading branch information
Edward-Jackson-ONS authored Aug 21, 2024
1 parent b82328f commit 6cc6855
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 6 deletions.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ Once pre-commits are activated, whenever you commit to this repository a series
## Usage
[theyworkforyou.com](https://www.theyworkforyou.com)
By default, parliamentary content from the previous day (and anything so far on the current day) will be reviewed. However, a number of flags are available for use from the command line. The main time filtering behaviours can be summarised as follows:
Although this product started out for internal use - with **Office for National Statistics** and **ONS** as search terms - users can specify keywords in this public edition. By amending the `keywords` variable in the `src.parliai_public._config.base.toml`, a parliamentary coverage report for different organisations, people or themes can be generated.
By default, parliamentary content from the previous day (and anything so far on the current day) will be reviewed. However, a number of flags are available for use from the command line to access historical content (as long as it is still available at source). The main time filtering behaviours can be summarised as follows:
- previous day (default) e.g.
Expand Down Expand Up @@ -83,6 +85,10 @@ $ python scripts/theyworkforyou.py -d 2024-05-24 -n 3
Additionally, the `-w` or `--weekly` flag can be used to generate a report for the previous week e.g. a Wednesday to a Wednesday. The `-f` or `--form` flag can also be applied to specify a preferred date format (other than the default of %Y-%m-%d).
### Accuracy
An additional step has been added, at the post-processing stage, to verify LLM responses as being direct extracts from the original transcripts. Comparisons are made by sentence once all punctuation has been removed. Where this condition is not satisfied, the LLM response(s) is still used in the final report but a user warning is appended as a reminder to exercise caution when consuming AI-generated content.
![LLM Content Warning](docs/images/llm-content-warning.png)
### Workflow
![Illustrative technical workflow](docs/images/parliai-public-workflow.png)
Expand Down
2 changes: 2 additions & 0 deletions src/parliai_public/_config/base.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ keywords = ["Office for National Statistics", "ONS"]
prompt = ""
llm_name = ""
outdir = ""

inconsistency_statement = """:warning: **Inconsistencies between the LLM response and the original transcript have been detected. This does necessarily mean that this report entry is erroneous but you are strongly urged to exercise caution. Please click the link provided to check the original transcript at source.**"""
48 changes: 45 additions & 3 deletions src/parliai_public/readers/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,13 @@ def __init__(
prompt: None | str = None,
llm_name: None | str = None,
llm: None | ChatOllama = None,
inconsistency_statement: None | str = None,
) -> None:
self.urls = urls
self.terms = (
terms
or toml.load("src/parliai_public/_config/base.toml")["keywords"]
base_config = toml.load("src/parliai_public/_config/base.toml")
self.terms = terms or base_config["keywords"]
self.inconsistency_statement = (
inconsistency_statement or base_config["inconsistency_statement"]
)
self.dates = dates or [dt.date.today() - dt.timedelta(days=1)]
self.outdir = outdir
Expand Down Expand Up @@ -369,6 +371,12 @@ def analyse(self, transcript: dict) -> dict:
for chunk in chunks:
if self.check_contains_terms(chunk.page_content):
response = self._analyse_chunk(chunk)

# failed check
if not self._check_response(response, chunk):
response += f"\n\n{self.inconsistency_statement}"
print("LLM response inconsistent with source.")

responses.append(response)

transcript["response"] = "\n\n".join(responses)
Expand Down Expand Up @@ -464,6 +472,40 @@ def _analyse_chunk(self, chunk: Document) -> str:

return response

def _check_response(self, response: str, chunk: Document) -> bool:
"""Check if LLM response appears verbatim in original text.
Parameters
----------
response : str
LLM response, lightly formatted.
chunk : langchain.docstore.document.Document
Document with the chunk contents.
Returns
-------
passed : bool
True/False the LLM response is present exactly in the original.
"""

# TODO: string formatting function to reduce code
original = chunk.page_content.lower()
original = re.sub(r"[^\w\s]", "", original)

passed = False

for el in response.split(". "):
el = el.lower()
el = re.sub(r"[^\w\s]", "", el)

if el.lower() not in original:
passed = False
return passed
else:
passed = True

return passed

def save(self, page: dict) -> None:
"""
Save an HTML entry to a more compact JSON format.
Expand Down
10 changes: 8 additions & 2 deletions tests/readers/base/test_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,15 @@ def test_load_config_default():
# TODO: keywords hardcoded not ideal for flexible keywords
expected = {
"urls": [],
"inconsistency_statement": "",
"keywords": ["Office for National Statistics", "ONS"],
"prompt": "",
"outdir": "",
"llm_name": "",
}
config = ToyReader._load_config()

assert config == expected
assert config.keys() == expected.keys()


@given(st_terms_and_texts())
Expand Down Expand Up @@ -294,7 +295,12 @@ def test_analyse(params):
response = reader.analyse({"text": "foo"})

assert isinstance(response, dict) and "response" in response
assert response["response"].split("\n\n") == responses

# do not include LLM content warning in this test
response_split = response["response"].split("\n\n")
assert [
x for x in response_split if not x.startswith(":warning:")
] == responses

splitter.assert_called_once_with("foo")

Expand Down

0 comments on commit 6cc6855

Please sign in to comment.