Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support named parameters #10

Closed
dynajoe opened this issue Apr 2, 2015 · 32 comments
Closed

Support named parameters #10

dynajoe opened this issue Apr 2, 2015 · 32 comments

Comments

@dynajoe
Copy link

dynajoe commented Apr 2, 2015

It'd be nice to specify named parameters in the query that references properties on an object passed in place of the array of values that would normally map to $1, $2, etc.

db.one("select * from users where id=:id", {id: 123}) // find the user from id;
    .then(function(data){
        // find 'login' records for the user found:
        return db.query("select * from audit where event=:event and userId=:userId",
        {event: "login", userId: data.id});
    })
    .then(function(data){
        // display found audit records;
        console.log(data);
    }, function(reason){
        console.log(reason); // display reason why the call failed;
    })

I created a small wrapper for the plain postgres module to do this for my internal project. I really like how your library looks and think this would be an excellent addition to essentially make my project obsolete.

https://github.com/joeandaverde/tinypg

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 2, 2015

@joeandaverde, I think it is generally a good idea, but your implementation for the variable syntax reminds me of an issue we had recently with the $1,$2 variables when it comes to long variable names.

When we encounter $1, $2 variables, we make sure they are not followed by a digit, i.e. they are not $11, $22, or something like that, because those are then different variables. But in your case there cannot be a truncating symbol to be identified, because you use regular letters for your variables. As a result, how could you possibly tell the difference between variables :id and :idiom?

Now, if you were to suggest using something like this: ${id} or even :id:, there wouldn't be any question, as it clearly identifies the delimiters, but you don't seem to use any, which creates a problem.

Any thoughts on that? Any existing standard for that, perhaps, as a reference?

@dynajoe
Copy link
Author

dynajoe commented Apr 2, 2015

There's a lot to be desired in the regular expression that's breaking up the SQL statement. Currently it's simply using '.split(/(:\w+)/)' which would work in the cases you mentioned.

> 'SELECT * FROM users where id = :id and idiom = :idiom'.split(/(:\w+)/)
[ 'SELECT * FROM users where id = ',
  ':id',
  ' and idiom = ',
  ':idiom',
  '' ]
>

Where it would fail is if you wanted to use the full range of valid Ecmascript variable names. This is because Ecmascript allows for unicode variable names (see: https://mathiasbynens.be/notes/javascript-identifiers-es6). Another possible issue is that object keys can be any arbitrary string. This would also fail to match :\w+.

I'm currently using [A-Za-z0-9_]+ (aka \w+ see: http://www.w3schools.com/jsref/jsref_regexp_wordchar.asp) which handles most sane variable names.

@dynajoe
Copy link
Author

dynajoe commented Apr 2, 2015

I'm also up to find a more resilient way of representing variable insertion.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 2, 2015

I'm going to look around for any kind of standard way for identifying a variable in that case, and it would help if you did too ;)

Also, I'd like to avoid enumerating an object-parameter for its properties, because it may have lots more than just the parameters needed, and I believe that using the query as the source for variable names is the way to go, i.e. we search through the query using regex to make the list of all variables to be replaced, and only then use the object to pull its properties, and not the other way round.

However, this does imply use of strict variable syntax, to avoid misidentifying them.

@dynajoe
Copy link
Author

dynajoe commented Apr 2, 2015

Another issue with my solution is that it doesn't account for postgres type casts. However, this can probably be fixed with a negative look behind for :. However, that's not supported in JavaScript's RegExp.

select 1::int

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 2, 2015

Yes, this does create an ambiguity, with Postgres using :: for type casting.

@dynajoe
Copy link
Author

dynajoe commented Apr 2, 2015

In regards to your previous message about enumerating object properties:

https://github.com/joeandaverde/tinypg/blob/8cab154a9652fe15ba087688171ebc3f5afa02d3/index.js#L60-L62

This particular implementation pulls a mapping of $1 to variable name. It iterates those instead of the object parameter to extract values into an array.

@dynajoe
Copy link
Author

dynajoe commented Apr 2, 2015

It would be ideal not to collide with ES6 template strings to avoid needing to escape:

https://babeljs.io/docs/learn-es6/#template-strings

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 2, 2015

Thank you for that link. The best syntax I can think of for now is ${propertyName}, as the most standard and reliable way for injecting object properties.

If you find a better one - please let me know ;)

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 2, 2015

Just so, about your concern of creating a conflict: http://updates.html5rocks.com/2015/01/ES6-Template-Strings

According to that, ES6 specification requires use of `` symbols to declare a template, as opposed to '' simple quotes, and since we are using regular strings to inject variables, there wouldn't be a conflict with ES6 Template Strings, even if we use the exact same syntax for variables.

In fact, I see something more of a positive thing that we would have it compatible with ES6 that way ;)

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

I'm going to take some time to think about the best approach to doing this. But in the meantime, I was playing with the idea of how to pull names of all variables-properties in a query for the syntax of ${varName}, and I came up with the following code:

/////////////////////////////////////////////////////////////
// Returns array of unique variable names in a text string;
//
// - variables are defined using syntax: ${varName};
// - a valid variable starts with a letter or underscore symbol,
//   followed by any combination of letters, digits or underscores.
// - a variable can have any number of leading and trailing spaces;
function enumVars(txt) {
    var v, names = [];
    var reg = /\$\{\s*[a-zA-Z_][a-zA-Z0-9_]*\s*}/g;
    while (v = reg.exec(txt)) {
        var svn = v[0].replace(/[\$\{\s}]/g, ''); // stripped variable name;
        if(names.indexOf(svn) === -1) {
            names.push(svn);
        }
    }
    return names;
}

// example:
console.log(enumVars("${var_1} ${  _var2_  } ${ var3_} ${ var_1 }"));
// outputs: [ 'var_1', '_var2_', 'var3_' ]

The next step would be just to pull values of the corresponding properties from the object-parameter and do the replacements. Easy! :)

@vitaly-t vitaly-t self-assigned this Apr 3, 2015
@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

I have created a new branch named-params, to start working in this direction, with the above method checked in, as a starting point.

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

I originally started with regex exec. When I got to replacement's I found it was easier using the split method on string (using a capture group in order to not lose variable name). This made it easier to map Postgres vars to property names.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

I'd like to see that easier way. I know that with RegEx it would be just one line of code, using the same RegEx I used to search for variables. I'm not sure the same simplicity can be accomplished with simple string functions, considering all the complex cases of white spaces that RegEx takes care of automatically. Please show me your simpler alternative.

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

Can't edit from phone or link to line. But you can see an example here in the function parseSql: https://github.com/joeandaverde/tinypg/blob/master/index.js

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

It is difficult to understand the way it works there exactly, except for the fact that it deals with a different syntax other than ${varName} and that it does a replacement right there?

Would you like to create a version of it that produces the same result as one I wrote above?

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

I have finished the initial implementation for supporting named parameters in the new branch - check it out ;)

For the time being I'm using my own RegEx approach, and I included one complex test there, see in file: indexSpec.js:

    it("must correctly format named parameters or throw an error", function () {
        // - correctly handles leading and trailing spaces;
        // - supports underscores and digits in names;
        // - can join variables values next to each other;
        // - converts all simple types correctly;
        // - replaces undefined variables with null;
        // - variables are case-sensitive;
        expect(pgp.as.format("${ NamE_},${d_o_b },${  _active__},${ file_5A  }${__Balance}", {
            NamE_: "John O'Connor",
            d_o_b: dateSample,
            _active__: true,
            file_5a: 'something', // not to be found, due to case difference;
            __Balance: -123.45
        })).toBe("'John O''Connor','" + dateSample.toUTCString() + "',TRUE,null-123.45");
    });

If you think that you can replace the search-and-replace approach with a better one - give it a go, you have all the code, and even a good test to begin with.

Other than that, before the branch is ready to be merged into master, I will be doing a number updates covering tests, documentation, and possibly implementation, if something better comes along.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

P.S. Please star this package, if you like :) It helps with the search result on npm ;)

@vitaly-t vitaly-t added this to the Named Parameters milestone Apr 3, 2015
@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

As there were no further comments, I finished all tests and documentation updates and released version 0.8.0 with the new feature. Details are in the release notes.

@vitaly-t vitaly-t closed this as completed Apr 3, 2015
@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

I think we were talking about something completely different. What you implemented was formatted strings. My original request was to implement a feature to allow for parameterized SQL queries using named parameters. In order to accomplish this safely (not using string interpolation as this opens a gaping hole for SQL injection) there needs to be a way to map $1, $2 etc to array elements.

It's worth mentioning again that the implementation as is condones unsanitized SQL queries.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

To my knowledge, there is no such thing for that which you want. Postgres doesn't support named parameters. See this: brianc/node-postgres#268

And SQL injection isn't a real problem on its own. One has to create a gap in URL-to variable mapping, or otherwise do a poor design overall. Therefore I'm not overly concerned about it.

I extended the syntax to support named variables from objects, which works now, and this is about as good as it's gonna get :) I don't see how i can make it any better.

The current code for 0.8.1 will be released soon, with one change to parameter formatting - it will throw an error when a variable doesn't exist, as opposed to formatting it as null right now.

And I don't see how possibly we could have been talking about something else when even your references to your own library suggested exactly what was done here.

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

I'm not suggesting named variables IN the final SQL query.

db.query('SELECT * FROM users WHERE name = :name and age > :min_age', { name: 'joe', min_age: 30 })

would get transformed by your library to be

db.query('SELECT * FROM users WHERE name = $1 and age > $2', ['joe', 30])

This type of query can be sanitized by the postgres database engine whereas one which allows for string substitution would not. This is bad:

db.query("SELECT * FROM users WHERE name = 'joe' and age > 30")

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

This type of query can be sanitized by the postgres database engine whereas one which allows for string substitution would not.

What are you talking about? Postgres doesn't take any variable parameters. Nothing can be sanitized by Postgres database engine. Only the final query can be passed from the client, nothing else.

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

The parameters passed to Postgres can be injected into the query in a safe manor for example:

as.format('SELECT * FROM users WHERE name = ${name} and age > ${min_age}', {name: '\'1\'; DROP TABLE users;--', min_age: 30 })

would become

'SELECT * FROM users WHERE name = \'1\'; DROP TABLE users;-- and age > 30'

If instead those parameters are passed to the Postgres engine ($1, $2 => ['Joe', 30]), it can do the appropriate escaping of malicious characters.

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

Here's an example of the Postgres log of similar query.

LOG:  statement: ROLLBACK
LOG:  statement: BEGIN
LOG:  execute <unnamed>: INSERT INTO bottle.inbound_message (source, raw_text)
    VALUES ($1, $2)
    RETURNING *
DETAIL:  parameters: $1 = 'xxx', $2 = 'TEST MESSAGE'

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

How do you pass $1, $2 into Postgres when it doesn't have a protocol for doing this?

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

It does. That's what you're seeing above. Directly from the Postgres logs.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

Please, point at the Postgres online documentation about passing parameters with a query, because I've never heard of such thing's existence.

P.S. I'm done for the day, back to this tomorrow.

@dynajoe
Copy link
Author

dynajoe commented Apr 3, 2015

http://www.postgresql.org/docs/9.2/static/plpgsql-statements.html

scroll down half way (39.5.4. Executing Dynamic Commands)

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 3, 2015

Those are local formatting commands, with specific use of EXECUTE query USING, for use from inside functions only. It is unusable in our case.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 4, 2015

It seems that PG now has support for Prepared Statements, which may be what you are looking for.

By the way, if you like PG implementation of parameter formatting better, you can always use that instead. This library has always allowed that, see Initialization Options, parameter pgFormatting.

@vitaly-t
Copy link
Owner

@joeandaverde Unrelated to this, but if you liked this library, I really appreciate if you can give me a feedback on this small addition: pg-monitor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants