Customizable `bad_token_ids` policies #8

dxoigmn · 2025-01-27T21:24:37Z

llmart has the capability of banning "bad" tokens from the adversarial optimization.

Right now bad_token_ids implements a static policy for what is considered a "bad" token (non-printability, ascii-only):

LLMart/src/llmart/tokenizer.py

Lines 428 to 444 in c7bbef3

    
           def bad_token_ids(self) -> torch.Tensor: 
        
               added_tokens = self.added_tokens_encoder.keys() 
        
               tokens = [ 
        
                   self.convert_tokens_to_string([token]) 
        
                   for token in self.convert_ids_to_tokens(list(range(self.__vocab_size))) 
        
               ] 
        
               printable_tokens = torch.tensor( 
        
                   [ 
        
                       token.isprintable() 
        
                       and token.isascii() 
        
                       and token not in added_tokens 
        
                       and len(token.strip()) > 0 
        
                       for token in tokens 
        
                   ], 
        
               ) 
        
               return torch.where(~printable_tokens)[0]

Being able to add configurable policies would help with non-ascii languages. Additionally, being able to ban a set of tokens would also be beneficial.

The text was updated successfully, but these errors were encountered:

harshit-parikh-28 · 2025-03-06T08:08:35Z

@dxoigmn - Based on the bad_token_ids implementation, it currently identifies tokens as "bad" if they are non-printable or non-ASCII. These bad tokens are subsequently ignored (banned) by the (function) ignored_values: Tensor, where ignored_values=tokenizer.bad_token_ids.

Based on the above understanding, I have a couple of questions:

Would you prefer a configurable policy that allows end users to define what constitutes a bad token?
How would end-users configure or customize this policy for non-ASCII languages to identify bad tokens? Would this be done via CLI arguments for a set of specific non-ASCII characters?

I would appreciate more clarification on this issue.

Added customizable `bad_token_ids` policies. Fixes #8 --------- Signed-off-by: harshit-parikh-28 <[email protected]> Signed-off-by: Marius Arvinte <[email protected]> Co-authored-by: harshit-parikh-28 <[email protected]> Co-authored-by: Marius Arvinte <[email protected]>

mariusarvinte added the hackathon label Feb 6, 2025

adarshan-intel assigned adarshan-intel and harshit-parikh-28 Mar 7, 2025

adarshan-intel mentioned this issue Mar 7, 2025

Added customizable bad_token_ids policies #44

Merged

mariusarvinte linked a pull request Mar 7, 2025 that will close this issue

OSS Hackathon 2025 release #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customizable `bad_token_ids` policies #8

Customizable `bad_token_ids` policies #8

dxoigmn commented Jan 27, 2025 •

edited by mariusarvinte

Loading

harshit-parikh-28 commented Mar 6, 2025

Customizable bad_token_ids policies #8

Customizable bad_token_ids policies #8

Comments

dxoigmn commented Jan 27, 2025 • edited by mariusarvinte Loading

harshit-parikh-28 commented Mar 6, 2025

Customizable `bad_token_ids` policies #8

Customizable `bad_token_ids` policies #8

dxoigmn commented Jan 27, 2025 •

edited by mariusarvinte

Loading