Skip to content

Commit 9c2a506

Browse files
authored
Merge pull request #583 from EgoAleSum/email-patch
normalizeEmail improvements
2 parents 3266767 + e12cd97 commit 9c2a506

File tree

6 files changed

+538
-38
lines changed

6 files changed

+538
-38
lines changed

README.md

+12-1
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,18 @@ Passing anything other than a string is an error.
105105
- **escape(input)** - replace `<`, `>`, `&`, `'`, `"` and `/` with HTML entities.
106106
- **unescape(input)** - replaces HTML encoded entities with `<`, `>`, `&`, `'`, `"` and `/`.
107107
- **ltrim(input [, chars])** - trim characters from the left-side of the input.
108-
- **normalizeEmail(email [, options])** - canonicalize an email address. `options` is an object which defaults to `{ lowercase: true, remove_dots: true, remove_extension: true }`. With `lowercase` set to `true`, the local part of the email address is lowercased for all domains; the hostname is always lowercased and the local part of the email address is always lowercased for hosts that are known to be case-insensitive (currently only GMail). Normalization follows special rules for known providers: currently, GMail addresses have dots removed in the local part and are stripped of extensions (e.g. `[email protected]` becomes `[email protected]`) and all `@googlemail.com` addresses are normalized to `@gmail.com`.
108+
- **normalizeEmail(email [, options])** - canonicalizes an email address. `options` is an object with the following keys and default values:
109+
- *all_lowercase: true* - Transforms the local part (before the @ symbol) of all email addresses to lowercase. Please note that this may violate RFC 5321, which gives providers the possibility to treat the local part of email addresses in a case sensitive way (although in practice most - yet not all - providers don't). The domain part of the email address is always lowercased, as it's case insensitive per RFC 1035.
110+
- *gmail_lowercase: true* - GMail addresses are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, GMail addresses are lowercased regardless of the value of this setting.
111+
- *gmail_remove_dots: true*: Removes dots from the local part of the email address, as GMail ignores them (e.g. "john.doe" and "johndoe" are considered equal).
112+
- *gmail_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
113+
- *gmail_convert_googlemaildotcom: true*: Converts addresses with domain @googlemail.com to @gmail.com, as they're equivalent.
114+
- *outlookdotcom_lowercase: true* - Outlook.com addresses (including Windows Live and Hotmail) are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, Outlook.com addresses are lowercased regardless of the value of this setting.
115+
- *outlookdotcom_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
116+
- *yahoo_lowercase: true* - Yahoo Mail addresses are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, Yahoo Mail addresses are lowercased regardless of the value of this setting.
117+
- *yahoo_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "-" sign (e.g. "[email protected]" becomes "[email protected]").
118+
- *icloud_lowercase: true* - iCloud addresses (including MobileMe) are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, iCloud addresses are lowercased regardless of the value of this setting.
119+
- *icloud_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
109120
- **rtrim(input [, chars])** - trim characters from the right-side of the input.
110121
- **stripLow(input [, keep_new_lines])** - remove characters with a numerical value < 32 and 127, mostly control characters. If `keep_new_lines` is `true`, newline characters are preserved (`\n` and `\r`, hex `0xA` and `0xD`). Unicode-safe in JavaScript.
111122
- **toBoolean(input [, strict])** - convert the input string to a boolean. Everything except for `'0'`, `'false'` and `''` returns `true`. In strict mode only `'1'` and `'true'` return `true`.

lib/normalizeEmail.js

+96-9
Original file line numberDiff line numberDiff line change
@@ -16,32 +16,119 @@ var _merge2 = _interopRequireDefault(_merge);
1616
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { default: obj }; }
1717

1818
var default_normalize_email_options = {
19-
lowercase: true,
20-
remove_dots: true,
21-
remove_extension: true
19+
// The following options apply to all email addresses
20+
// Lowercases the local part of the email address.
21+
// Please note this may violate RFC 5321 as per http://stackoverflow.com/a/9808332/192024).
22+
// The domain is always lowercased, as per RFC 1035
23+
all_lowercase: true,
24+
25+
// The following conversions are specific to GMail
26+
// Lowercases the local part of the GMail address (known to be case-insensitive)
27+
gmail_lowercase: true,
28+
// Removes dots from the local part of the email address, as that's ignored by GMail
29+
gmail_remove_dots: true,
30+
// Removes the subaddress (e.g. "+foo") from the email address
31+
gmail_remove_subaddress: true,
32+
// Conversts the googlemail.com domain to gmail.com
33+
gmail_convert_googlemaildotcom: true,
34+
35+
// The following conversions are specific to Outlook.com / Windows Live / Hotmail
36+
// Lowercases the local part of the Outlook.com address (known to be case-insensitive)
37+
outlookdotcom_lowercase: true,
38+
// Removes the subaddress (e.g. "+foo") from the email address
39+
outlookdotcom_remove_subaddress: true,
40+
41+
// The following conversions are specific to Yahoo
42+
// Lowercases the local part of the Yahoo address (known to be case-insensitive)
43+
yahoo_lowercase: true,
44+
// Removes the subaddress (e.g. "-foo") from the email address
45+
yahoo_remove_subaddress: true,
46+
47+
// The following conversions are specific to iCloud
48+
// Lowercases the local part of the iCloud address (known to be case-insensitive)
49+
icloud_lowercase: true,
50+
// Removes the subaddress (e.g. "+foo") from the email address
51+
icloud_remove_subaddress: true
2252
};
2353

54+
// List of domains used by iCloud
55+
var icloud_domains = ['icloud.com', 'me.com'];
56+
57+
// List of domains used by Outlook.com and its predecessors
58+
// This list is likely incomplete.
59+
// Partial reference:
60+
// https://blogs.office.com/2013/04/17/outlook-com-gets-two-step-verification-sign-in-by-alias-and-new-international-domains/
61+
var outlookdotcom_domains = ['hotmail.at', 'hotmail.be', 'hotmail.ca', 'hotmail.cl', 'hotmail.co.il', 'hotmail.co.nz', 'hotmail.co.th', 'hotmail.co.uk', 'hotmail.com', 'hotmail.com.ar', 'hotmail.com.au', 'hotmail.com.br', 'hotmail.com.gr', 'hotmail.com.mx', 'hotmail.com.pe', 'hotmail.com.tr', 'hotmail.com.vn', 'hotmail.cz', 'hotmail.de', 'hotmail.dk', 'hotmail.es', 'hotmail.fr', 'hotmail.hu', 'hotmail.id', 'hotmail.ie', 'hotmail.in', 'hotmail.it', 'hotmail.jp', 'hotmail.kr', 'hotmail.lv', 'hotmail.my', 'hotmail.ph', 'hotmail.pt', 'hotmail.sa', 'hotmail.sg', 'hotmail.sk', 'live.be', 'live.co.uk', 'live.com', 'live.com.ar', 'live.com.mx', 'live.de', 'live.es', 'live.eu', 'live.fr', 'live.it', 'live.nl', 'msn.com', 'outlook.at', 'outlook.be', 'outlook.cl', 'outlook.co.il', 'outlook.co.nz', 'outlook.co.th', 'outlook.com', 'outlook.com.ar', 'outlook.com.au', 'outlook.com.br', 'outlook.com.gr', 'outlook.com.pe', 'outlook.com.tr', 'outlook.com.vn', 'outlook.cz', 'outlook.de', 'outlook.dk', 'outlook.es', 'outlook.fr', 'outlook.hu', 'outlook.id', 'outlook.ie', 'outlook.in', 'outlook.it', 'outlook.jp', 'outlook.kr', 'outlook.lv', 'outlook.my', 'outlook.ph', 'outlook.pt', 'outlook.sa', 'outlook.sg', 'outlook.sk', 'passport.com'];
62+
63+
// List of domains used by Yahoo Mail
64+
// This list is likely incomplete
65+
var yahoo_domains = ['rocketmail.com', 'yahoo.ca', 'yahoo.co.uk', 'yahoo.com', 'yahoo.de', 'yahoo.fr', 'yahoo.in', 'yahoo.it', 'ymail.com'];
66+
2467
function normalizeEmail(email, options) {
2568
options = (0, _merge2.default)(options, default_normalize_email_options);
69+
2670
if (!(0, _isEmail2.default)(email)) {
2771
return false;
2872
}
2973
var parts = email.split('@', 2);
74+
75+
// The domain is always lowercased, as it's case-insensitive per RFC 1035
3076
parts[1] = parts[1].toLowerCase();
77+
3178
if (parts[1] === 'gmail.com' || parts[1] === 'googlemail.com') {
32-
if (options.remove_extension) {
79+
// Address is GMail
80+
if (options.gmail_remove_subaddress) {
3381
parts[0] = parts[0].split('+')[0];
3482
}
35-
if (options.remove_dots) {
83+
if (options.gmail_remove_dots) {
3684
parts[0] = parts[0].replace(/\./g, '');
3785
}
3886
if (!parts[0].length) {
3987
return false;
4088
}
41-
parts[0] = parts[0].toLowerCase();
42-
parts[1] = 'gmail.com';
43-
} else if (options.lowercase) {
44-
parts[0] = parts[0].toLowerCase();
89+
if (options.all_lowercase || options.gmail_lowercase) {
90+
parts[0] = parts[0].toLowerCase();
91+
}
92+
parts[1] = options.gmail_convert_googlemaildotcom ? 'gmail.com' : parts[1];
93+
} else if (~icloud_domains.indexOf(parts[1])) {
94+
// Address is iCloud
95+
if (options.icloud_remove_subaddress) {
96+
parts[0] = parts[0].split('+')[0];
97+
}
98+
if (!parts[0].length) {
99+
return false;
100+
}
101+
if (options.all_lowercase || options.icloud_lowercase) {
102+
parts[0] = parts[0].toLowerCase();
103+
}
104+
} else if (~outlookdotcom_domains.indexOf(parts[1])) {
105+
// Address is Outlook.com
106+
if (options.outlookdotcom_remove_subaddress) {
107+
parts[0] = parts[0].split('+')[0];
108+
}
109+
if (!parts[0].length) {
110+
return false;
111+
}
112+
if (options.all_lowercase || options.outlookdotcom_lowercase) {
113+
parts[0] = parts[0].toLowerCase();
114+
}
115+
} else if (~yahoo_domains.indexOf(parts[1])) {
116+
// Address is Yahoo
117+
if (options.yahoo_remove_subaddress) {
118+
var components = parts[0].split('-');
119+
parts[0] = components.length > 1 ? components.slice(0, -1).join('-') : components[0];
120+
}
121+
if (!parts[0].length) {
122+
return false;
123+
}
124+
if (options.all_lowercase || options.yahoo_lowercase) {
125+
parts[0] = parts[0].toLowerCase();
126+
}
127+
} else {
128+
// Any other address
129+
if (options.all_lowercase) {
130+
parts[0] = parts[0].toLowerCase();
131+
}
45132
}
46133
return parts.join('@');
47134
}

src/lib/normalizeEmail.js

+192-9
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,215 @@ import isEmail from './isEmail';
22
import merge from './util/merge';
33

44
const default_normalize_email_options = {
5-
lowercase: true,
6-
remove_dots: true,
7-
remove_extension: true,
5+
// The following options apply to all email addresses
6+
// Lowercases the local part of the email address.
7+
// Please note this may violate RFC 5321 as per http://stackoverflow.com/a/9808332/192024).
8+
// The domain is always lowercased, as per RFC 1035
9+
all_lowercase: true,
10+
11+
// The following conversions are specific to GMail
12+
// Lowercases the local part of the GMail address (known to be case-insensitive)
13+
gmail_lowercase: true,
14+
// Removes dots from the local part of the email address, as that's ignored by GMail
15+
gmail_remove_dots: true,
16+
// Removes the subaddress (e.g. "+foo") from the email address
17+
gmail_remove_subaddress: true,
18+
// Conversts the googlemail.com domain to gmail.com
19+
gmail_convert_googlemaildotcom: true,
20+
21+
// The following conversions are specific to Outlook.com / Windows Live / Hotmail
22+
// Lowercases the local part of the Outlook.com address (known to be case-insensitive)
23+
outlookdotcom_lowercase: true,
24+
// Removes the subaddress (e.g. "+foo") from the email address
25+
outlookdotcom_remove_subaddress: true,
26+
27+
// The following conversions are specific to Yahoo
28+
// Lowercases the local part of the Yahoo address (known to be case-insensitive)
29+
yahoo_lowercase: true,
30+
// Removes the subaddress (e.g. "-foo") from the email address
31+
yahoo_remove_subaddress: true,
32+
33+
// The following conversions are specific to iCloud
34+
// Lowercases the local part of the iCloud address (known to be case-insensitive)
35+
icloud_lowercase: true,
36+
// Removes the subaddress (e.g. "+foo") from the email address
37+
icloud_remove_subaddress: true,
838
};
939

40+
// List of domains used by iCloud
41+
const icloud_domains = [
42+
'icloud.com',
43+
'me.com',
44+
];
45+
46+
// List of domains used by Outlook.com and its predecessors
47+
// This list is likely incomplete.
48+
// Partial reference:
49+
// https://blogs.office.com/2013/04/17/outlook-com-gets-two-step-verification-sign-in-by-alias-and-new-international-domains/
50+
const outlookdotcom_domains = [
51+
'hotmail.at',
52+
'hotmail.be',
53+
'hotmail.ca',
54+
'hotmail.cl',
55+
'hotmail.co.il',
56+
'hotmail.co.nz',
57+
'hotmail.co.th',
58+
'hotmail.co.uk',
59+
'hotmail.com',
60+
'hotmail.com.ar',
61+
'hotmail.com.au',
62+
'hotmail.com.br',
63+
'hotmail.com.gr',
64+
'hotmail.com.mx',
65+
'hotmail.com.pe',
66+
'hotmail.com.tr',
67+
'hotmail.com.vn',
68+
'hotmail.cz',
69+
'hotmail.de',
70+
'hotmail.dk',
71+
'hotmail.es',
72+
'hotmail.fr',
73+
'hotmail.hu',
74+
'hotmail.id',
75+
'hotmail.ie',
76+
'hotmail.in',
77+
'hotmail.it',
78+
'hotmail.jp',
79+
'hotmail.kr',
80+
'hotmail.lv',
81+
'hotmail.my',
82+
'hotmail.ph',
83+
'hotmail.pt',
84+
'hotmail.sa',
85+
'hotmail.sg',
86+
'hotmail.sk',
87+
'live.be',
88+
'live.co.uk',
89+
'live.com',
90+
'live.com.ar',
91+
'live.com.mx',
92+
'live.de',
93+
'live.es',
94+
'live.eu',
95+
'live.fr',
96+
'live.it',
97+
'live.nl',
98+
'msn.com',
99+
'outlook.at',
100+
'outlook.be',
101+
'outlook.cl',
102+
'outlook.co.il',
103+
'outlook.co.nz',
104+
'outlook.co.th',
105+
'outlook.com',
106+
'outlook.com.ar',
107+
'outlook.com.au',
108+
'outlook.com.br',
109+
'outlook.com.gr',
110+
'outlook.com.pe',
111+
'outlook.com.tr',
112+
'outlook.com.vn',
113+
'outlook.cz',
114+
'outlook.de',
115+
'outlook.dk',
116+
'outlook.es',
117+
'outlook.fr',
118+
'outlook.hu',
119+
'outlook.id',
120+
'outlook.ie',
121+
'outlook.in',
122+
'outlook.it',
123+
'outlook.jp',
124+
'outlook.kr',
125+
'outlook.lv',
126+
'outlook.my',
127+
'outlook.ph',
128+
'outlook.pt',
129+
'outlook.sa',
130+
'outlook.sg',
131+
'outlook.sk',
132+
'passport.com',
133+
];
134+
135+
// List of domains used by Yahoo Mail
136+
// This list is likely incomplete
137+
const yahoo_domains = [
138+
'rocketmail.com',
139+
'yahoo.ca',
140+
'yahoo.co.uk',
141+
'yahoo.com',
142+
'yahoo.de',
143+
'yahoo.fr',
144+
'yahoo.in',
145+
'yahoo.it',
146+
'ymail.com',
147+
];
148+
10149
export default function normalizeEmail(email, options) {
11150
options = merge(options, default_normalize_email_options);
151+
12152
if (!isEmail(email)) {
13153
return false;
14154
}
15155
const parts = email.split('@', 2);
156+
157+
// The domain is always lowercased, as it's case-insensitive per RFC 1035
16158
parts[1] = parts[1].toLowerCase();
159+
17160
if (parts[1] === 'gmail.com' || parts[1] === 'googlemail.com') {
18-
if (options.remove_extension) {
161+
// Address is GMail
162+
if (options.gmail_remove_subaddress) {
19163
parts[0] = parts[0].split('+')[0];
20164
}
21-
if (options.remove_dots) {
165+
if (options.gmail_remove_dots) {
22166
parts[0] = parts[0].replace(/\./g, '');
23167
}
24168
if (!parts[0].length) {
25169
return false;
26170
}
27-
parts[0] = parts[0].toLowerCase();
28-
parts[1] = 'gmail.com';
29-
} else if (options.lowercase) {
30-
parts[0] = parts[0].toLowerCase();
171+
if (options.all_lowercase || options.gmail_lowercase) {
172+
parts[0] = parts[0].toLowerCase();
173+
}
174+
parts[1] = options.gmail_convert_googlemaildotcom ? 'gmail.com' : parts[1];
175+
} else if (~icloud_domains.indexOf(parts[1])) {
176+
// Address is iCloud
177+
if (options.icloud_remove_subaddress) {
178+
parts[0] = parts[0].split('+')[0];
179+
}
180+
if (!parts[0].length) {
181+
return false;
182+
}
183+
if (options.all_lowercase || options.icloud_lowercase) {
184+
parts[0] = parts[0].toLowerCase();
185+
}
186+
} else if (~outlookdotcom_domains.indexOf(parts[1])) {
187+
// Address is Outlook.com
188+
if (options.outlookdotcom_remove_subaddress) {
189+
parts[0] = parts[0].split('+')[0];
190+
}
191+
if (!parts[0].length) {
192+
return false;
193+
}
194+
if (options.all_lowercase || options.outlookdotcom_lowercase) {
195+
parts[0] = parts[0].toLowerCase();
196+
}
197+
} else if (~yahoo_domains.indexOf(parts[1])) {
198+
// Address is Yahoo
199+
if (options.yahoo_remove_subaddress) {
200+
let components = parts[0].split('-');
201+
parts[0] = (components.length > 1) ? components.slice(0, -1).join('-') : components[0];
202+
}
203+
if (!parts[0].length) {
204+
return false;
205+
}
206+
if (options.all_lowercase || options.yahoo_lowercase) {
207+
parts[0] = parts[0].toLowerCase();
208+
}
209+
} else {
210+
// Any other address
211+
if (options.all_lowercase) {
212+
parts[0] = parts[0].toLowerCase();
213+
}
31214
}
32215
return parts.join('@');
33216
}

0 commit comments

Comments
 (0)