gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual#126923
gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual#126923merlinz01 wants to merge 12 commits intopython:mainfrom
Conversation
…DictEqual The function previously used a simple difflib.ndiff on top of a pprint.pformat of each dict, which resulted in very bad performance on large dicts and unclear assertion error outputs in many cases. This change formats the diffs in a more readable manner by inspecting the differences between the dicts, truncating long keys and values, and justifying values in the various groups of lines.
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Eclips4
left a comment
There was a problem hiding this comment.
Can you please provide benchmarks testing the old and new approaches?
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
|
Comparing two dicts with 10000 keys and values of 20 random bytes each. class TestDictCompare(unittest.TestCase):
def test_compare_dicts(self):
first = self.generate_dict()
second = self.generate_dict()
self.assertDictEqual(first, second)
def generate_dict(self):
length = 10000
d = {}
for _ in range(length):
d[random.randbytes(20)] = random.randbytes(20)
return dThe previous implementation takes 15.3 seconds: The new implementation takes 0.28 seconds: |
|
For the test in the linked issue, I'm getting 6.6 seconds with the old vs. 0.22 seconds with the new. |
|
An example of the new output: (edited to reflect changes in fcfdd92) |
|
Thanks! That's a pretty nice speedup! I'm just wondering, do we want to show the |
|
I'll remove the explanation line for identical keys and values. The old implementation doesn't omit anything, so if we omit values I think there should be some indication that there are more values than what is shown, rather than silently dropping them. |
tomasr8
left a comment
There was a problem hiding this comment.
Now that the diffing is more complex, I think it needs more tests to cover the possible outputs :)
|
This PR is stale because it has been open for 30 days with no activity. |
The function previously used a simple
difflib.ndiffon top of apprint.pformatof each dict, which resulted in very bad performance on large dicts and unclear assertion error outputs in many cases. This change formats the diffs in a more readable manner by inspecting the differences between the dicts, truncating long keys and values, and justifying values in the various groups of lines.