Opened 6 years ago

Closed 6 years ago

#17385 closed enhancement (fixed)

Cleanup Graph.__init__ and DiGraph.__init__

Reported by: vdelecroix Owned by:
Priority: major Milestone: sage-6.5
Component: graph theory Keywords:
Cc: ncohen Merged in:
Authors: Vincent Delecroix Reviewers: David Coudert, Travis Scrimshaw, Nathann Cohen
Report Upstream: N/A Work issues:
Branch: 0e8d1c4 (Commits) Commit: 0e8d1c42c6fc12f59a25359059f1a5025ece20c3
Dependencies: #17384 Stopgaps:

Description

While reading #17384, I thought that a global cleanup of Graph.__init__ and DiGraph.__init__ would be good. We implement:

1)

- data = dict([u, list(data[u])] for u in data)
+ data = {u: list(v) for u,v in data.iteritems()}

2) call to uniq is bad since it calls a useless sorted: each appearance of {{{len(uniq(X))}}} is replaced with {{{len(set(X))}}}.

3) dict.iteritems is more readable and faster

sage: d = {randint(0,1000): [randint(0,1000) for _ in range(10)] for _ in range(100)}
sage: timeit("for i in d: j = d[i]", number=2000)
2000 loops, best of 3: 9.75 µs per loop
sage: timeit("for i,j in d.iteritems(): pass", number=2000)
2000 loops, best of 3: 7.3 µs per loop

4) avoid iterating through the elements of the adjacency matrix when possible

and more...

Change History (36)

comment:1 Changed 6 years ago by vdelecroix

  • Branch set to u/vdelecroix/17385
  • Status changed from new to needs_review

comment:2 Changed 6 years ago by git

  • Commit set to 3f8a98bc9caa9f64504cf2d85ce4ed39ea7ce12c

Branch pushed to git repo; I updated commit sha1. New commits:

c4bf5cftrac #17384: Slowness when calling Graph(a_dictionary)
3bea4e5trac #17385: replacement uniq -> set
3f8a98btrac #17385: various optimization

comment:3 Changed 6 years ago by dcoudert

  • Reviewers set to David Coudert
  • Status changed from needs_review to positive_review

I was not expecting bool is False to be faster than bool === False. Amazing.

sage: bool = True
sage: %timeit bool is False
10000000 loops, best of 3: 70.3 ns per loop
sage: %timeit bool == False
10000000 loops, best of 3: 88.7 ns per loop
sage:
sage: bool = False
sage: %timeit bool == False
10000000 loops, best of 3: 88.9 ns per loop
sage: %timeit bool is False
10000000 loops, best of 3: 70.2 ns per loop

For me the patch is OK (installation, test, doc), so I give positive review.

comment:4 Changed 6 years ago by vdelecroix

Thanks for the review.

To answer your interrogation is and == are very different, and is will always be faster.

The command bool is False is treated without asking anything to the object bool. It is just the Python kernel that checks whether the memory addresses match or not. For == a method of the object bool is called. Even if the comparison is implemented with something like return self is other you do not avoid that function call.

But I am not sure that those 18 non seconds are of critical importance in the creation of a graph... it was more a matter of uniformization ;-)

Vincent

comment:5 follow-up: Changed 6 years ago by tscrim

While that's an improvement, the fastest way is to do if x: or if not x:.

sage: x = True
sage: %timeit if x is True: pass
10000000 loops, best of 3: 165 ns per loop
sage: %timeit if x is False: pass
10000000 loops, best of 3: 148 ns per loop
sage: %timeit if x: pass
10000000 loops, best of 3: 91.9 ns per loop
sage: %timeit if not x: pass
10000000 loops, best of 3: 80.1 ns per loop
sage: x = False
sage: %timeit if x is True: pass
10000000 loops, best of 3: 148 ns per loop
sage: %timeit if x is False: pass
10000000 loops, best of 3: 165 ns per loop
sage: %timeit if x: pass
10000000 loops, best of 3: 79.1 ns per loop
sage: %timeit if not x: pass
10000000 loops, best of 3: 91.9 ns per loop

So I'd change things to use that. Plus you can simplify things like:

+         if ((loops is None or loops is False) and
-         if (not loops and

(and if statements like that are more common).

comment:6 in reply to: ↑ 5 Changed 6 years ago by vdelecroix

  • Status changed from positive_review to needs_work

Replying to tscrim:

While that's an improvement, the fastest way is to do if x: or if not x:.

You are right... I will do the changes (but your timings are not complete since you forgot the case x = None). There were occurrences at other places in the code. The advantage I see is that, while you read it, you know that None is one possible choice whereas

if not x and some_condition:
    if x is False:
        raise ValueError
    x = True

is a bit less verbose.

Vincent

Last edited 6 years ago by vdelecroix (previous) (diff)

comment:7 Changed 6 years ago by tscrim

True. When I was doing my timings I wasn't thinking of the None simplifications, but FTR:

sage: x = None
sage: %timeit if x is None: pass
10000000 loops, best of 3: 117 ns per loop
sage: %timeit if x is not None: pass
10000000 loops, best of 3: 102 ns per loop
sage: %timeit if x: pass
10000000 loops, best of 3: 88 ns per loop
sage: %timeit if not x: pass
10000000 loops, best of 3: 101 ns per loop

I would say the loss of verbosity (according to FF, I'm not making that word up like I thought) is covered by python ducktyping. Anyways, I don't really care that much, so if you don't think it should be changed, then feel free to leave it. Thanks for cleaning this up.

comment:8 Changed 6 years ago by dcoudert

I agree that we should try to let the code as readable as possible, especially for non critical code.

If we want to optimize further, there is apparently a small difference between x is not None and not x is None ;-)

sage: x = None
sage: %timeit if x is not None: pass
10000000 loops, best of 3: 57.8 ns per loop
sage: %timeit if not x is None: pass
10000000 loops, best of 3: 57.2 ns per loop

comment:9 Changed 6 years ago by vdelecroix

Hi,

Actually it was good to have another look at the code since I introduced a bug for weighted digraph.... (see the test in the last commit that will appear in a second).

@dcoudert: actually is not is an operator that should be as fast as is. While not involves some function call... it is strange that you get those timings.

Vincent

comment:10 Changed 6 years ago by git

  • Commit changed from 3f8a98bc9caa9f64504cf2d85ce4ed39ea7ce12c to 2a5495dc5013a99a6dd5387ef4b7fd4727250e9a

Branch pushed to git repo; I updated commit sha1. New commits:

2a5495dtrac #17384: more cleanup after reviewer's comment

comment:11 Changed 6 years ago by vdelecroix

  • Status changed from needs_work to needs_review

I am a bit worried about the code duplication. It was a pain to me to look systematically at both graph.py and digraph.py to search for potential mistakes. And most of it are very similar if not identical... if you have a magic solution for that.

comment:12 Changed 6 years ago by git

  • Commit changed from 2a5495dc5013a99a6dd5387ef4b7fd4727250e9a to 0fa9c20b5cf44989b40726e5c740c82a564a5263

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0fa9c20trac #17385: more cleanup after reviewer's comment

comment:13 Changed 6 years ago by vdelecroix

(wrong commit message, sorry)

comment:14 follow-up: Changed 6 years ago by dcoudert

I don't understand why you write

if not loops and any(u in neighb for u,neighb in data.iteritems()):
   if loops is False:

Obviously, if the first test is True, then we have loops==False. So you don't need the second one.

The same holds for if not weighted and any(...): if weighted is False: and if not multiedges and any(...): if multiedges is False: and ...

comment:15 in reply to: ↑ 14 Changed 6 years ago by vdelecroix

Replying to dcoudert:

I don't understand why you write

if not loops and any(u in neighb for u,neighb in data.iteritems()):
   if loops is False:

Obviously, if the first test is True, then we have loops==False. So you don't need the second one.

Nope: loops can be None (or anything that does evaluate to False such as 0, [], (), etc). Have a look at 6.

Last edited 6 years ago by vdelecroix (previous) (diff)

comment:16 Changed 6 years ago by ncohen

Yoooooooo !!

"While reading #17384, I thought that a global cleanup of Graph.init and DiGraph?.init would be good. We implement:"

Aahahaha. And do you want to know why I did not rewrite it all ? Because I thought that nobody would be willing to review the patch afterwards :-P

Thanks for your branch, I will try to review it today ;-)

Nathann

comment:17 follow-up: Changed 6 years ago by vdelecroix

Already two reviewers on #17385... for a "nobody" it is a lot! But if you have improvements to share, please do.

Vincent

comment:18 in reply to: ↑ 17 Changed 6 years ago by ncohen

Yo !

Already two reviewers on #17385... for a "nobody" it is a lot! But if you have improvements to share, please do.

True true, I do not complain :-P

As to improvements, it was not really the goal: I would really like to make those functions cleaner by splitting them into subfunction, and perhaphs unify a bit better Graph/Digraph? is possible. Just a rewrite, but for a cleaner code.

Nathann

comment:19 Changed 6 years ago by ncohen

By the way there is also this thing that has been waiting for a year or so ... #15706.

Nathann

comment:20 Changed 6 years ago by ncohen

Hellooooooooo !

Well, I just finished reading the patch and it looks good, thanks for that ! One remark:

  • Error on a loop: in the doctest you added, the error messages refers to a dictionary that the user never built. Also, "no loop but a loop" is not very informative.... What about 'The graph was built with loops=False but a loop was found in its data' or something ?

Nathann

comment:21 Changed 6 years ago by ncohen

  • Status changed from needs_review to needs_info

comment:22 Changed 6 years ago by git

  • Commit changed from 0fa9c20b5cf44989b40726e5c740c82a564a5263 to ca661347a2dcd8c5d7e4720cc048028365d22092

Branch pushed to git repo; I updated commit sha1. New commits:

ca66134trac #17385: better error message

comment:23 Changed 6 years ago by vdelecroix

  • Reviewers changed from David Coudert to David Coudert, Nathann Cohen
  • Status changed from needs_info to needs_review

comment:24 Changed 6 years ago by vdelecroix

  • Reviewers changed from David Coudert, Nathann Cohen to David Coudert, Travis Scrimshaw, Nathann Cohen

comment:25 follow-up: Changed 6 years ago by ncohen

  • Status changed from needs_review to needs_work
sage -t --long /home/ncohen/.Sage/src/doc/en/thematic_tutorials/lie/affine_finite_crystals.rst
**********************************************************************
File "/home/ncohen/.Sage/src/doc/en/thematic_tutorials/lie/affine_finite_crystals.rst", line 359, in doc.en.thematic_tutorials.lie.affine_finite_crystals
Failed example:
    G.is_isomorphic(Gdual, edge_labels = True, certify = True)
Expected:
    (True, {[[1]]: [[-2]], [[2]]: [[-1]], [[-2]]: [[1]], [[-1]]: [[2]], []: [[0]]})
Got:
    (True, {[]: [[0]], [[1]]: [[-2]], [[2]]: [[-1]], [[-2]]: [[1]], [[-1]]: [[2]]})

comment:26 in reply to: ↑ 25 ; follow-up: Changed 6 years ago by vdelecroix

Replying to ncohen:

sage -t --long /home/ncohen/.Sage/src/doc/en/thematic_tutorials/lie/affine_finite_crystals.rst
**********************************************************************
File "/home/ncohen/.Sage/src/doc/en/thematic_tutorials/lie/affine_finite_crystals.rst", line 359, in doc.en.thematic_tutorials.lie.affine_finite_crystals
Failed example:
    G.is_isomorphic(Gdual, edge_labels = True, certify = True)
Expected:
    (True, {[[1]]: [[-2]], [[2]]: [[-1]], [[-2]]: [[1]], [[-1]]: [[2]], []: [[0]]})
Got:
    (True, {[]: [[0]], [[1]]: [[-2]], [[2]]: [[-1]], [[-2]]: [[1]], [[-1]]: [[2]]})

This has something to do with this ticket? Strictly speaking, those dictionary are the same... the thing is that the keys have no well defined ordering. The only reasonable thing I see is to remove this certify = True in the test. What do you think?

Vincent

comment:27 in reply to: ↑ 26 Changed 6 years ago by ncohen

This has something to do with this ticket?

It seems. Does not happen otherwise O_o

Strictly speaking, those dictionary are the same... the thing is that the keys have no well defined ordering. The only reasonable thing I see is to remove this certify = True in the test. What do you think?

It is possible to test it with the_certificate == { the actual dictionary } but it seems overkill. +1 for a certify=False.

Nathann

comment:28 Changed 6 years ago by git

  • Commit changed from ca661347a2dcd8c5d7e4720cc048028365d22092 to 472f7e0148e2a1748fd79a03a03cd70946f087d4

Branch pushed to git repo; I updated commit sha1. New commits:

472f7e0trac #17385: fix doctest in affine finite crystals

comment:29 Changed 6 years ago by vdelecroix

  • Status changed from needs_work to needs_review

comment:30 Changed 6 years ago by ncohen

  • Status changed from needs_review to positive_review

Looks good, thanks !

Nathann

P.S.: You should do a "git pull" on the develop branch

comment:31 Changed 6 years ago by vbraun

  • Status changed from positive_review to needs_work

Conflicts, probably #15706

comment:32 Changed 6 years ago by git

  • Commit changed from 472f7e0148e2a1748fd79a03a03cd70946f087d4 to a4257137090f2c8459137ba83e81cde2d099a996

Branch pushed to git repo; I updated commit sha1. New commits:

a425713Merge Sage-6.5.beta5

comment:33 Changed 6 years ago by git

  • Commit changed from a4257137090f2c8459137ba83e81cde2d099a996 to 0e8d1c42c6fc12f59a25359059f1a5025ece20c3

Branch pushed to git repo; I updated commit sha1. New commits:

0e8d1c4trac #17385: uniq -> set (remains from #15706)

comment:34 follow-up: Changed 6 years ago by vdelecroix

  • Status changed from needs_work to needs_review

Merged. I guess a last check-up would be fine.

Vincent

comment:35 in reply to: ↑ 34 Changed 6 years ago by ncohen

  • Status changed from needs_review to positive_review

Merged. I guess a last check-up would be fine.

I ran a "make ptestlong" just because my office's computer was feeling useless. No problem, good to go ;-)

Nathann

comment:36 Changed 6 years ago by vbraun

  • Branch changed from u/vdelecroix/17385 to 0e8d1c42c6fc12f59a25359059f1a5025ece20c3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.