-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
gh-73936: Support unicode passwords in smtplib #103611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,113 @@ | ||||||||
| # Copyright 2016-present MongoDB, Inc. | ||||||||
| # | ||||||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||
| # you may not use this file except in compliance with the License. | ||||||||
| # You may obtain a copy of the License at | ||||||||
| # | ||||||||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||||||||
| # | ||||||||
| # Unless required by applicable law or agreed to in writing, software | ||||||||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||||||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||
| # See the License for the specific language governing permissions and | ||||||||
| # limitations under the License. | ||||||||
|
|
||||||||
| """An implementation of RFC4013 SASLprep.""" | ||||||||
| from typing import Any, Optional | ||||||||
|
|
||||||||
| try: | ||||||||
| import stringprep | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. stringprep is available in all supported versions of python. |
||||||||
| except ImportError: | ||||||||
| HAVE_STRINGPREP = False | ||||||||
|
|
||||||||
| def saslprep(data: Any, prohibit_unassigned_code_points: Optional[bool] = True) -> str: | ||||||||
| """SASLprep dummy""" | ||||||||
| if isinstance(data, str): | ||||||||
| raise TypeError( | ||||||||
| "The stringprep module is not available. Usernames and " | ||||||||
| "passwords must be instances of bytes." | ||||||||
| ) | ||||||||
| return data | ||||||||
|
|
||||||||
| else: | ||||||||
| HAVE_STRINGPREP = True | ||||||||
| import unicodedata | ||||||||
|
|
||||||||
| # RFC4013 section 2.3 prohibited output. | ||||||||
| _PROHIBITED = ( | ||||||||
| # A strict reading of RFC 4013 requires table c12 here, but | ||||||||
| # characters from it are mapped to SPACE in the Map step. Can | ||||||||
| # normalization reintroduce them somehow? | ||||||||
| stringprep.in_table_c12, | ||||||||
| stringprep.in_table_c21_c22, | ||||||||
| stringprep.in_table_c3, | ||||||||
| stringprep.in_table_c4, | ||||||||
| stringprep.in_table_c5, | ||||||||
| stringprep.in_table_c6, | ||||||||
| stringprep.in_table_c7, | ||||||||
| stringprep.in_table_c8, | ||||||||
| stringprep.in_table_c9, | ||||||||
| ) | ||||||||
|
|
||||||||
| def saslprep(data: Any, prohibit_unassigned_code_points: Optional[bool] = True) -> str: | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||||||||
| """An implementation of RFC4013 SASLprep. | ||||||||
|
|
||||||||
| :Parameters: | ||||||||
| - `data`: The string to SASLprep. Unicode strings | ||||||||
| (:class:`str`) are supported. Byte strings | ||||||||
| (:class:`bytes`) are ignored. | ||||||||
| - `prohibit_unassigned_code_points`: True / False. RFC 3454 | ||||||||
| and RFCs for various SASL mechanisms distinguish between | ||||||||
| `queries` (unassigned code points allowed) and | ||||||||
| `stored strings` (unassigned code points prohibited). Defaults | ||||||||
|
Comment on lines
+59
to
+62
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, for the argument name it would be better to refer to the use case (something like |
||||||||
| to ``True`` (unassigned code points are prohibited). | ||||||||
|
|
||||||||
| :Returns: | ||||||||
| The SASLprep'ed version of `data`. | ||||||||
| """ | ||||||||
| prohibited: Any | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does nothing.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be there to silence a type checker. Type checkers in Python consider tuples structures, not immutable lists. Therefore concatenating tuples changes their type. Instead of removing the line, I'd provide a better annotation. Something like:
Suggested change
|
||||||||
|
|
||||||||
| if not isinstance(data, str): | ||||||||
| return data | ||||||||
|
|
||||||||
| if prohibit_unassigned_code_points: | ||||||||
| prohibited = _PROHIBITED + (stringprep.in_table_a1,) | ||||||||
| else: | ||||||||
| prohibited = _PROHIBITED | ||||||||
|
|
||||||||
| # RFC3454 section 2, step 1 - Map | ||||||||
| # RFC4013 section 2.1 mappings | ||||||||
| # Map Non-ASCII space characters to SPACE (U+0020). Map | ||||||||
| # commonly mapped to nothing characters to, well, nothing. | ||||||||
| in_table_c12 = stringprep.in_table_c12 | ||||||||
| in_table_b1 = stringprep.in_table_b1 | ||||||||
| data = "".join( | ||||||||
| ["\u0020" if in_table_c12(elt) else elt for elt in data if not in_table_b1(elt)] | ||||||||
| ) | ||||||||
|
|
||||||||
| # RFC3454 section 2, step 2 - Normalize | ||||||||
| # RFC4013 section 2.2 normalization | ||||||||
| data = unicodedata.ucd_3_2_0.normalize("NFKC", data) | ||||||||
|
|
||||||||
| in_table_d1 = stringprep.in_table_d1 | ||||||||
| if in_table_d1(data[0]): | ||||||||
| if not in_table_d1(data[-1]): | ||||||||
| # RFC3454, Section 6, #3. If a string contains any | ||||||||
| # RandALCat character, the first and last characters | ||||||||
| # MUST be RandALCat characters. | ||||||||
| raise ValueError("SASLprep: failed bidirectional check") | ||||||||
| # RFC3454, Section 6, #2. If a string contains any RandALCat | ||||||||
| # character, it MUST NOT contain any LCat character. | ||||||||
| prohibited = prohibited + (stringprep.in_table_d2,) | ||||||||
| else: | ||||||||
| # RFC3454, Section 6, #3. Following the logic of #3, if | ||||||||
| # the first character is not a RandALCat, no other character | ||||||||
| # can be either. | ||||||||
| prohibited = prohibited + (in_table_d1,) | ||||||||
|
|
||||||||
| # RFC3454 section 2, step 3 and 4 - Prohibit and check bidi | ||||||||
| for char in data: | ||||||||
| if any(in_table(char) for in_table in prohibited): | ||||||||
| raise ValueError("SASLprep: failed prohibited character check") | ||||||||
|
|
||||||||
| return data | ||||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,39 @@ | ||||||||
| # Copyright 2016-present MongoDB, Inc. | ||||||||
| # | ||||||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||
| # you may not use this file except in compliance with the License. | ||||||||
| # You may obtain a copy of the License at | ||||||||
| # | ||||||||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||||||||
| # | ||||||||
| # Unless required by applicable law or agreed to in writing, software | ||||||||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||||||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||
| # See the License for the specific language governing permissions and | ||||||||
| # limitations under the License. | ||||||||
|
|
||||||||
| import sys | ||||||||
| import unittest | ||||||||
| from saslprep import saslprep | ||||||||
|
|
||||||||
|
|
||||||||
| class TestSASLprep(unittest.TestCase): | ||||||||
| def test_saslprep(self): | ||||||||
| try: | ||||||||
| import stringprep # noqa | ||||||||
| except ImportError: | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As @barry-scott said above, |
||||||||
| self.assertRaises(TypeError, saslprep, "anything...") | ||||||||
| # Bytes strings are ignored. | ||||||||
| self.assertEqual(saslprep(b"user"), b"user") | ||||||||
| else: | ||||||||
| # Examples from RFC4013, Section 3. | ||||||||
| self.assertEqual(saslprep("I\u00ADX"), "IX") | ||||||||
| self.assertEqual(saslprep("user"), "user") | ||||||||
| self.assertEqual(saslprep("USER"), "USER") | ||||||||
| self.assertEqual(saslprep("\u00AA"), "a") | ||||||||
| self.assertEqual(saslprep("\u2168"), "IX") | ||||||||
| self.assertRaises(ValueError, saslprep, "\u0007") | ||||||||
| self.assertRaises(ValueError, saslprep, "\u0627\u0031") | ||||||||
|
|
||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also add the test cases from mongodb's JS library? |
||||||||
| # Bytes strings are ignored. | ||||||||
| self.assertEqual(saslprep(b"user"), b"user") | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's test that with some non-UTF8 bytes?
Suggested change
|
||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Added support for unicode logins/passwords for SMTP authentication. | ||
|
|
||
| This includes SASLprep support, graciously contributed by MongoDB.com. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this as
Lib/_saslprep.pyas we aren't offering this up as a public documented stdlib module and API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine this being useful outside of smtplib. I wrote it originally for PyMongo. I would have loved to be able to just import it instead. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new stdlib module shouldn't be taken lightly.
Would it make sense to add the function to
smtplib.pydirectly?If you want this to be public documented API, then it should have documentation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful in imaplib, at least, which also uses SASL. I'd love to refactor the SASL use im imaplib and smtplib, but that's outside the scope of this PR. That refactoring should probably make saslprep public, if it wasn't before? I have no opinion on whether saslprep should be public without that refactoring.
(Many thanks for all the other comments; I'll get back to this tomorrow. Meetings and another PR grabbed the mutex on my day today and kept it until now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it living in its own file (which may be considered "required" for for license text reasons), but if this is to be a public API, I'd prefer to see the function exposed publicly from within an existing stdlib module, not a new top level module name.
It is a bit of a utility function. Another plausible stdlib module it could make sense to be exposed publicly from would be
hashlib, given its purpose.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be allowed to move the licence elsewhere? In fact we should probably add it to the long list of small licences...
After the code becomes part of Python, it will be improved, refactored and moved around. If that isn't allowed, we should IMO keep it as an external dependency.