Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizable bad_token_ids policies #8

Open
dxoigmn opened this issue Jan 27, 2025 · 1 comment · Fixed by #44 · May be fixed by #41
Open

Customizable bad_token_ids policies #8

dxoigmn opened this issue Jan 27, 2025 · 1 comment · Fixed by #44 · May be fixed by #41
Assignees

Comments

@dxoigmn
Copy link
Contributor

dxoigmn commented Jan 27, 2025

llmart has the capability of banning "bad" tokens from the adversarial optimization.

Right now bad_token_ids implements a static policy for what is considered a "bad" token (non-printability, ascii-only):

def bad_token_ids(self) -> torch.Tensor:
added_tokens = self.added_tokens_encoder.keys()
tokens = [
self.convert_tokens_to_string([token])
for token in self.convert_ids_to_tokens(list(range(self.__vocab_size)))
]
printable_tokens = torch.tensor(
[
token.isprintable()
and token.isascii()
and token not in added_tokens
and len(token.strip()) > 0
for token in tokens
],
)
return torch.where(~printable_tokens)[0]

Being able to add configurable policies would help with non-ascii languages. Additionally, being able to ban a set of tokens would also be beneficial.

@harshit-parikh-28
Copy link
Collaborator

@dxoigmn - Based on the bad_token_ids implementation, it currently identifies tokens as "bad" if they are non-printable or non-ASCII. These bad tokens are subsequently ignored (banned) by the (function) ignored_values: Tensor, where ignored_values=tokenizer.bad_token_ids.

Based on the above understanding, I have a couple of questions:

  1. Would you prefer a configurable policy that allows end users to define what constitutes a bad token?
  2. How would end-users configure or customize this policy for non-ASCII languages to identify bad tokens? Would this be done via CLI arguments for a set of specific non-ASCII characters?

I would appreciate more clarification on this issue.

mariusarvinte added a commit that referenced this issue Mar 7, 2025
Added customizable `bad_token_ids` policies. Fixes #8

---------

Signed-off-by: harshit-parikh-28 <[email protected]>
Signed-off-by: Marius Arvinte <[email protected]>
Co-authored-by: harshit-parikh-28 <[email protected]>
Co-authored-by: Marius Arvinte <[email protected]>
@mariusarvinte mariusarvinte linked a pull request Mar 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants