-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DLP: Added sample for inspect string with custom regex #3107
Changes from 2 commits
950e423
85154c5
bdf806c
525fabb
2eb44ff
8a9f464
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
// Copyright 2023 Google LLC | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
'use strict'; | ||
|
||
// sample-metadata: | ||
// title: Inspects strings | ||
// description: Inspects a string using custom regex pattern. | ||
// usage: node inspectWithCustomRegex.js my-project string minLikelihood maxFindings infoTypes customInfoTypes includeQuote | ||
|
||
function main( | ||
projectId, | ||
string, | ||
minLikelihood, | ||
maxFindings, | ||
infoTypes, | ||
customInfoTypes, | ||
includeQuote | ||
) { | ||
[infoTypes, customInfoTypes] = transformCLI(infoTypes, customInfoTypes); | ||
// [START dlp_inspect_custom_regex] | ||
// Imports the Google Cloud Data Loss Prevention library | ||
const DLP = require('@google-cloud/dlp'); | ||
|
||
// Instantiates a client | ||
const dlp = new DLP.DlpServiceClient(); | ||
|
||
// The project ID to run the API call under | ||
// const projectId = 'my-project'; | ||
|
||
// The string to inspect | ||
// const string = 'Patients MRN 444-5-22222'; | ||
|
||
// The minimum likelihood required before returning a match | ||
// const minLikelihood = DLP.protos.google.privacy.dlp.v2.Likelihood.POSSIBLE; | ||
|
||
// The maximum number of findings to report per request (0 = server maximum) | ||
// const maxFindings = 0; | ||
|
||
// The infoTypes of information to match | ||
// See https://cloud.google.com/dlp/docs/concepts-infotypes for more information | ||
// about supported infoTypes. | ||
// const infoTypes = [{ name: 'EMAIL_ADDRESS' }]; | ||
|
||
// The customInfoTypes of information to match | ||
// const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}}, | ||
// { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}]; | ||
|
||
// Whether to include the matching string | ||
// const includeQuote = true; | ||
|
||
async function inspectWithCustomRegex() { | ||
// Construct item to inspect | ||
const item = { | ||
byteItem: { | ||
type: DLP.protos.google.privacy.dlp.v2.ByteContentItem.BytesType | ||
.TEXT_UTF8, | ||
data: Buffer.from(string, 'utf-8'), | ||
}, | ||
}; | ||
|
||
// Assigns likelihood to each match | ||
customInfoTypes = customInfoTypes.map(customInfoType => { | ||
soumya92 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
customInfoType.likelihood = | ||
DLP.protos.google.privacy.dlp.v2.Likelihood.POSSIBLE; | ||
return customInfoType; | ||
}); | ||
|
||
// Construct request | ||
const request = { | ||
parent: `projects/${projectId}/locations/global`, | ||
inspectConfig: { | ||
infoTypes: infoTypes, | ||
customInfoTypes: customInfoTypes, | ||
minLikelihood: minLikelihood, | ||
includeQuote: includeQuote, | ||
limits: { | ||
maxFindingsPerRequest: maxFindings, | ||
}, | ||
}, | ||
item: item, | ||
}; | ||
|
||
// Run request | ||
const [response] = await dlp.inspectContent(request); | ||
const findings = response.result.findings; | ||
if (findings.length > 0) { | ||
console.log('Findings:'); | ||
findings.forEach(finding => { | ||
soumya92 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (includeQuote) { | ||
console.log(`\tQuote: ${finding.quote}`); | ||
} | ||
console.log(`\tInfo type: ${finding.infoType.name}`); | ||
soumya92 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
console.log(`\tLikelihood: ${finding.likelihood}`); | ||
}); | ||
} else { | ||
console.log('No findings.'); | ||
} | ||
} | ||
inspectWithCustomRegex(); | ||
// [END dlp_inspect_custom_regex] | ||
} | ||
|
||
main(...process.argv.slice(2)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be after process.on to avoid missing synchronous promise rejections. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any use case where this can happen? I have updated the code as you mentioned but during testing, I found the same results. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it happens right now, but I could imagine in the future there might be synchronous validation of requests (e.g. bad enum values might throw before even making a network request). |
||
process.on('unhandledRejection', err => { | ||
console.error(err.message); | ||
process.exitCode = 1; | ||
}); | ||
|
||
function transformCLI(infoTypes, customInfoTypes) { | ||
infoTypes = infoTypes | ||
? infoTypes.split(',').map(type => { | ||
return {name: type}; | ||
}) | ||
: undefined; | ||
|
||
if (customInfoTypes) { | ||
customInfoTypes = customInfoTypes.includes(',') | ||
soumya92 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
? customInfoTypes.split(',').map((dict, idx) => { | ||
return { | ||
infoType: {name: 'CUSTOM_DICT_'.concat(idx.toString())}, | ||
dictionary: {wordList: {words: dict.split(',')}}, | ||
}; | ||
}) | ||
: customInfoTypes.split(',').map((rgx, idx) => { | ||
return { | ||
infoType: {name: 'CUSTOM_REGEX_'.concat(idx.toString())}, | ||
regex: {pattern: rgx}, | ||
}; | ||
}); | ||
} | ||
|
||
return [infoTypes, customInfoTypes]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we trim this down?
string
: requiredminLikelihood
: we should be able to remove this because the likelihood for custom infotype (which is how regex is used) is controlled by the request anyway.maxFindings
: does not make sense in the context of this sample. We're inspecting a small-ish string, not megabytes of file content, so there will only be so many findings. If you want to limit it anyway, just hardcode a limit of ~1000 in the sample. Users can change it if they need to.infoTypes
: Can omit, since we're demonstrating regex matching only. I guess it's possible we may want to show side-by-side detection of custom and built-in infotypes, but if that's the case move this to the end and make it optional. (Also if that is the case, lets make the example string actually demonstrate that)customInfoTypes
: Since the whole point of this sample is to demonstrate regex, we should ask for regex directly and construct the custom infotype in code.includeQuote
: as with maxFindings, lets just set this to true for demo purposes. If users want to change it they can edit the code.At a high level we should make the sample as easy as possible to run. Adding a lot of parameters and using obscure syntax (such as the ',' and regex/dict hybrid for customInfoTypes) will lead to confusion and frustration.
As a user of this sample, I should be able to say
node inspectWithCustomRegex.js 'this is my serial number aab-bcdd-eef' '[a-f\-]10'
and see results.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumya92 I had the same thought when I started the implementation but I noticed a particular structure is being followed for all inspect samples. Couldn't figure out the exact reason but mostly it was to keep the sample code consistent. Anyway, I feel your findings look reasonable and so I have updated this sample. Also, will it be okay if I make these same changes in my other PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah go for it! The easier we make our samples to use, the better