Opened 4 years ago
Closed 3 years ago
#26608 closed defect (fixed)
Docbuild segfaults when pari is compiled with threading
Reported by: | gh-timokau | Owned by: | embray |
---|---|---|---|
Priority: | major | Milestone: | sage-pending |
Component: | documentation | Keywords: | docbuild, pari |
Cc: | arojas, jdemeyer, fbissey, dimpase, saraedum, embray | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
This ticket is a followup to this sage-packaging discussion. To summarize:
sage does not work together with pari's threading. Instead of relying on it being compiled without threading, I made use of the "nthreads" option to disable threading at runtime in #26002.
However since #24655 (unconditionally enabling threaded docbuild), the docbuild segfaults when pari is compiled with threading support. Apparently sage somehow uses pari while ignoring the nthread
option. We get the following backtrace (provided by Antonio with an older version of sage):
Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 113, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python2.7/multiprocessing/pool.py", line 65, in mapstar return map(*args) File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 70, in build_ref_doc getattr(ReferenceSubBuilder(doc, lang), format)(*args, **kwds) File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 720, in _wrapper getattr(DocBuilder, build_type)(self, *args, **kwds) File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 104, in f runsphinx() File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/sphinxbuild.py", line 207, in runsphinx sphinx.cmdline.main(sys.argv) File "/usr/lib/python2.7/site-packages/sphinx/cmdline.py", line 296, in main app.build(opts.force_all, filenames) File "/usr/lib/python2.7/site-packages/sphinx/application.py", line 333, in build self.builder.build_update() File "/usr/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 251, in build_update 'out of date' % len(to_build)) File "/usr/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 265, in build self.doctreedir, self.app)) File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 549, in update self._read_serial(docnames, app) File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 569, in _read_serial self.read_doc(docname, app) File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 677, in read_doc pub.publish() File "/usr/lib/python2.7/site-packages/docutils/core.py", line 217, in publish self.settings) File "/usr/lib/python2.7/site-packages/sphinx/io.py", line 55, in read self.parse() File "/usr/lib/python2.7/site-packages/docutils/readers/__init__.py", line 78, in parse self.parser.parse(self.input, document) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/__init__.py", line 191, in parse self.statemachine.run(inputlines, document, inliner=self.inliner) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 171, in run input_source=document['source']) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2753, in underline self.section(title, source, style, lineno - 1, messages) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 327, in section self.new_subsection(title, lineno, messages) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection node=section_node, match_titles=True) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse node=node, match_titles=match_titles) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2328, in explicit_markup self.explicit_list(blank_finish) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2358, in explicit_list match_titles=self.state_machine.match_titles) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 319, in nested_list_parse node=node, match_titles=match_titles) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2631, in explicit_markup nodelist, blank_finish = self.explicit_construct(match) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2338, in explicit_construct return method(self, expmatch) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2081, in directive directive_class, match, type_name, option_presets) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2130, in run_directive result = directive_instance.run() File "/build/sagemath-doc/src/sage-8.0/src/sage_setup/docbuild/ext/sage_autodoc.py", line 1749, in run nested_parse_with_titles(self.state, self.result, node) File "/usr/lib/python2.7/site-packages/sphinx/util/nodes.py", line 208, in nested_parse_with_titles return state.nested_parse(content, 0, node, match_titles=1) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse node=node, match_titles=match_titles) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run results = StateMachineWS.run(self, input_lines, input_offset) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run context, state, transitions) File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line return method(match, context, next_state) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2326, in explicit_markup nodelist, blank_finish = self.explicit_construct(match) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2338, in explicit_construct return method(self, expmatch) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2081, in directive directive_class, match, type_name, option_presets) File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2130, in run_directive result = directive_instance.run() File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/__init__.py", line 410, in run self.state, self.state_machine) File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 189, in plot_directive return run(arguments, content, options, state_machine, state, lineno) File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 779, in run close_figs=context_opt == 'close-figs') File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 644, in render_figures run_code(code_piece, code_path, ns, function_name) File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 524, in run_code six.exec_(code, ns) File "/usr/lib/python2.7/site-packages/six.py", line 709, in exec_ exec("""exec _code_ in _globs_, _locs_""") File "<string>", line 1, in <module> File "<string>", line 1, in <module> File "sage/misc/classcall_metaclass.pyx", line 329, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1698) if cls.classcall is not None: File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 331, in __classcall__ .__classcall__(cls, points, connected, fine, regular, star, defined_affine) File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6065) ArgSpec(args=['self', 'algorithm', 'deg_bound', 'mult_bound', 'prot'], File "/usr/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__ instance = typecall(cls, *args, **options) File "sage/misc/classcall_metaclass.pyx", line 496, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2148) """ File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 367, in __init__ PointConfiguration_base.__init__(self, points, defined_affine) File "sage/geometry/triangulation/base.pyx", line 398, in sage.geometry.triangulation.base.PointConfiguration_base.__init__ (build/cythonized/sage/geometry/triangulation/base.cpp:4135) self._init_points(points) File "sage/geometry/triangulation/base.pyx", line 456, in sage.geometry.triangulation.base.PointConfiguration_base._init_points (build/cythonized/sage/geometry/triangulation/base.cpp:4982) red = matrix([ red.row(i) for i in red.pivot_rows()]) File "sage/matrix/matrix2.pyx", line 517, in sage.matrix.matrix2.Matrix.pivot_rows (build/cythonized/sage/matrix/matrix2.c:8414) """ File "sage/matrix/matrix_integer_dense.pyx", line 2217, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.pivots (build/cythonized/sage/matrix/matrix_integer_dense.c:19162) sage: matrix(3, range(9)).elementary_divisors() File "sage/matrix/matrix_integer_dense.pyx", line 2019, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.echelon_form (build/cythonized/sage/matrix/matrix_integer_dense.c:17749) File "sage/matrix/matrix_integer_dense.pyx", line 5719, in sage.matrix.matrix_integer_dense.Matrix_integer_dense._hnf_pari (build/cythonized/sage/matrix/matrix_integer_dense.c:46635) most `\max\mathcal{S}` where `\mathcal{S}` denotes the full SignalError: Segmentation fault
That shows us that src/sage/matrix/matrix_integer_dense.pyx
is involved. Apparently that file directly uses cypari c-bindings instead of the libs/pari.py
interface (where the nthreads
option is added). For example:
def LLL_gram(self, flag = 0): if self._nrows != self._ncols: raise ArithmeticError("self must be a square matrix") n = self.nrows() # maybe should be /unimodular/ matrices ? P = self.__pari__() try: U = P.qflllgram(flag) except (RuntimeError, ArithmeticError) as msg: raise ValueError("qflllgram failed, " "perhaps the matrix is not positive definite") if U.matsize() != [n, n]: raise ValueError("qflllgram did not return a square matrix, " "perhaps the matrix is not positive definite"); MS = matrix_space.MatrixSpace(ZZ,n) U = MS(U.sage()) # Fix last column so that det = +1 if U.det() == -1: for i in range(n): U[i,n-1] = - U[i,n-1] return U
Can someone more familiar with cython and cypari tell if the options defined in libs/pari.py
would apply here? Why isn't libs/pari.py
used?
Change History (45)
comment:1 Changed 4 years ago by
comment:2 follow-up: 5 Changed 4 years ago by
I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585
comment:3 Changed 4 years ago by
Note that the code excerpts in the last lines of the backtrace are nonsense since I was compiling an older version of the docs. Here's the "translated" version:
File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6065) w = self.f(*args, **kwds) File "/usr/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__ instance = typecall(cls, *args, **options) File "sage/misc/classcall_metaclass.pyx", line 496, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2148) return (<PyTypeObject*>type).tp_call(cls, args, kwds) File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 367, in __init__ PointConfiguration_base.__init__(self, points, defined_affine) File "sage/geometry/triangulation/base.pyx", line 398, in sage.geometry.triangulation.base.PointConfiguration_base.__init__ (build/cythonized/sage/geometry/triangulation/base.cpp:4135) self._init_points(points) File "sage/geometry/triangulation/base.pyx", line 456, in sage.geometry.triangulation.base.PointConfiguration_base._init_points (build/cythonized/sage/geometry/triangulation/base.cpp:4982) red = matrix([ red.row(i) for i in red.pivot_rows()]) File "sage/matrix/matrix2.pyx", line 517, in sage.matrix.matrix2.Matrix.pivot_rows (build/cythonized/sage/matrix/matrix2.c:8414) v = self.transpose().pivots() File "sage/matrix/matrix_integer_dense.pyx", line 2217, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.pivots (build/cythonized/sage/matrix/matrix_integer_dense.c:19162) E = self.echelon_form() File "sage/matrix/matrix_integer_dense.pyx", line 2019, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.echelon_form (build/cythonized/sage/matrix/matrix_integer_dense.c:17749) H_m = self._hnf_pari(flag, include_zero_rows=include_zero_rows) File "sage/matrix/matrix_integer_dense.pyx", line 5719, in sage.matrix.matrix_integer_dense.Matrix_integer_dense._hnf_pari (build/cythonized/sage/matrix/matrix_integer_dense.c:46635) sig_on() SignalError: Segmentation fault
comment:4 Changed 4 years ago by
Description: | modified (diff) |
---|
comment:5 follow-up: 10 Changed 4 years ago by
comment:6 follow-up: 12 Changed 4 years ago by
I'm not sure if it does, but it might. I grepped the pari/gp source and it doesn't use openblas_set_num_threads
directly, but something else might be. I did see some reference to omp_set_num_threads
but I don't think we compile with OpenMP by default in Sage.
comment:7 Changed 4 years ago by
There's also some multi-threading support in FLINT which could be problematic, but I have no idea if that's relevant in this case.
comment:8 Changed 4 years ago by
I read on the mailing list post "It is called indirectly via matplotlib when rendering plots, see full backtrace below (btw, I had to downgrade to an old version of Sage to get a meaningful backtrace - I really dislike this trend of hiding build output, it makes it very hard to debug stuff)"
I had a very similar problem to this; it actually came from the BLAS library by way of a Numpy ufunc (I think for the "dot" product of a matrix in a vector, or two matrices). I feel like I actually fixed this but now I can't remember.
comment:9 Changed 4 years ago by
Do you know some direct, specific way to reproduce this so that I can try it?
comment:10 follow-up: 14 Changed 4 years ago by
Replying to gh-timokau:
Replying to embray:
I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585
I'll test if that openblas patch fixes it. Very interesting ticket, I wonder if that also causes #26130 (I've heard darwin is somewhat more prone to threading bugs).
I don't think it's related, because this only showed up when we patched fflas-ffpack to allow configuring the number of threads to use with openblas (by default it just sets it to 1). But conceivably there's a similar bug elsewhere. Possibly related to fork(). I have found many bugs in different projects related to threads/fork interaction.
comment:11 Changed 4 years ago by
You know what though--I'm looking at the relevant code in openblas, and openblas_set_num_threads(1)
might actually cause it to spin up a single thread (beyond the main thread) which is actually good enough to invoke the bug I fixed. I'm going to try to confirm that though.
comment:12 Changed 4 years ago by
Replying to embray:
I'm not sure if it does, but it might. I grepped the pari/gp source and it doesn't use
openblas_set_num_threads
directly, but something else might be.
I don't think that PARI uses BLAS in any way.
comment:13 Changed 4 years ago by
Description: | modified (diff) |
---|
comment:14 follow-up: 16 Changed 4 years ago by
Replying to embray:
I have found many bugs in different projects related to threads/fork interaction.
I'm also betting on this. PARI might setup some data structures related to threading (when compiled with threading support) which are invalid when running in a forked child process.
comment:15 Changed 4 years ago by
Description: | modified (diff) |
---|
Nevermind; that does not appear to be the case, I don't think.
comment:16 Changed 4 years ago by
Replying to jdemeyer:
Replying to embray:
I have found many bugs in different projects related to threads/fork interaction.
I'm also betting on this. PARI might setup some data structures related to threading (when compiled with threading support) which are invalid when running in a forked child process.
Yes, I think you must be right. PARI has its own thread management, and it does not implement any pthread_atfork handler that I can find, which is strong cause to suspect it...
comment:17 follow-up: 19 Changed 4 years ago by
ISTM PARI/GP is not even built with multi-threading enabled unless you run its Configure
with either --mt=pthread
or --mt=mpi
. On my system anyways this is not done by default...
comment:18 follow-up: 29 Changed 4 years ago by
Now if I build PARI with --mt=pthread
I can make a child process segfault if it tries to do some multi-threaded work. Granted, this is partly with my own bad code which does some things improperly. Now I'd be curious if
a) Anyone getting having this problem is building PARI with --mt=pthread
and
b) Exactly what code is being run in Sage that invokes multi-threading in PARI.
comment:19 Changed 4 years ago by
Replying to embray:
ISTM PARI/GP is not even built with multi-threading enabled unless you run its
Configure
with either--mt=pthread
or--mt=mpi
. On my system anyways this is not done by default...
Yes, that is why this issue came up on sage-packaging. Some distros ship pari with threading enabled. Sage does not. My effort in #26002 was not to change that, but to make sage compatible with a system pari.
comment:20 Changed 4 years ago by
I can also make it deadlock with the right combination of evil calls.
comment:21 Changed 4 years ago by
Even when you explicitly set nthreads
to 1
before going forth with that evilness?
comment:22 Changed 4 years ago by
I don't know. I've been sick for a the last week so I completely forget exactly where I left this. Nevertheless, now I know that unless I compile pari with --mt=pthread
the problem won't occur at all. So I'll try doing that again, and then see if I can figure out exactly what code in Sage exhibits the bug. That will help me pinpoint it.
comment:24 Changed 4 years ago by
Milestone: | sage-8.5 → sage-8.7 |
---|---|
Owner: | set to embray |
Totally forgot about this...
comment:25 Changed 4 years ago by
Too bad, I also forgot about this. I'm literally right now returning from a week-long PARI/GP workshop where I could have discussed this.
comment:26 Changed 4 years ago by
The alternative multiprocess doc build introduced in #27490 works as a temporary workaround. I just replaced the revert with a one-line patch to enable it unconditionally for 8.7
comment:27 Changed 4 years ago by
Milestone: | sage-8.7 → sage-pending |
---|
Removing most of the rest of my open tickets out of the 8.7 milestone, which should be closed.
comment:29 Changed 4 years ago by
Replying to embray:
Now if I build PARI with
--mt=pthread
I can make a child process segfault if it tries to do some multi-threaded work. Granted, this is partly with my own bad code which does some things improperly. Now I'd be curious if
I wish I knew what I meant by this, because I want to investigate this again but I don't have a clear way yet to even reproduce the issue, and I can't find whatever example code this might be referring to... :(
comment:30 Changed 4 years ago by
Some PARI notes regarding threading:
- Setting
--mt=pthread
in PARI's Configure script results in it compiling a test file calledconfig/pthread.c
, and if that succeeds it sets a variablethread_engine=pthread
andenable_tls=yes
, and should print "Using mt engine pthread" (similar result if you use mpi, but for now I'm just looking at pthread).
- This also outputs the macro
#define PARI_MT_ENGINE "pthread"
in the generatedparicfg.h
.- In
src/language/paricfg.c
it sets a global constantconst char *paricfg_mt_engine = PARI_MT_ENGINE;
. This is only used when printing the version info like$ gp --version GP/PARI CALCULATOR Version 2.11.1 (released) amd64 running linux (x86-64/GMP-6.0.0 kernel) 64-bit version compiled: Aug 7 2019, gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.4) threading engine: pthread (readline v6.3 disabled, extended help enabled)
- In
- In the generated
Makefile
this also adds some bits, such asMT_LIBS=-lpthread
, the fileparimt.h
contains the contents ofsrc/mt/pthread.h
, and the modulesrc/mt/pthread.c
which implements an standard interface (including thepari_mt_init
function), is built, alongside some generic code insrc/mt/mt.c
- If instead we'd had
--mt=single
(the default) it would compilesrc/mt/single.c
which implements the same interfaces, mostly as no-ops (itspari_mt_init
just sets the global variablepari_mt_nbthreads = 1
).
- If instead we'd had
- In
src/mt/pthread.c
there is a global pointer declaredstatic struct mt_pstate *pari_mt
. This is a pointer to astruct mt_pstate
which contains several variables related to the state of pari's thread pool, including pointers to the threads themselves.- When functions like
mt_queue_start()
ormt_queue_start_lim()
(whichmt_queue_start()
is just a wrapper for) it initializes one of thesemt_pstate
structs, and sets the globalpari_mt
to point to it. - The global
pari_mt
is also referenced inmt_queue_reset()
. pari_mt
also contains a reference to one mutex, which is used when callingmt_queue_get
to synchronize access to per-thread results in such a way that it blocks until a result is available (but also allows interruption by sigint).
- When functions like
comment:31 Changed 4 years ago by
and Sage installs paricfg.h
with
#define PARI_MT_ENGINE "single"
Well, I can use this on #28242 to test whether we got single-threaded libpari.
comment:32 Changed 4 years ago by
Managed to reproduce the problem again by doing a parallel docbuild after building PARI with --mt=pthread
, which eventually gives a segfault.
Unfortunately the build_many
function doesn't give us any info on the traceback from the original exception, so I added the following patch to the docbuild code so it would at least print the traceback:
-
src/sage_setup/docbuild/__init__.py
diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py index e406bca..4912b5c 100644
a b import shutil 49 49 import subprocess 50 50 import sys 51 51 import time 52 import traceback 52 53 import warnings 53 54 54 55 logger = logging.getLogger(__name__) … … def builder_helper(type): 136 137 if ABORT_ON_ERROR: 137 138 raise 138 139 except BaseException as e: 140 exc_type, exc_value, exc_traceback = sys.exc_info() 141 traceback.print_tb(exc_traceback) 139 142 # We need to wrap a BaseException that is not an Exception in a 140 143 # regular Exception. Otherwise multiprocessing.Pool.get hangs, see 141 144 # #25161
With that, I get a long traceback (much of which is stuff in Sphinx that isn't interesting). But as previously reported--and unsurprisingly--the segfault originates from some code for a plot:
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 644, in render_figures run_code(code_piece, code_path, ns, function_name) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 524, in run_code six.exec_(code, ns) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/six.py", line 709, in exec_ exec("""exec _code_ in _globs_, _locs_""") File "<string>", line 1, in <module> File "<string>", line 2, in <module> File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/categories/finite_coxeter_groups.py", line 750, in permutahedron vertices = [v.change_ring(AA) for v in vertices] File "sage/modules/free_module_element.pyx", line 1495, in sage.modules.free_module_element.FreeModuleElement.change_ring (build/cythonized/sage/modules/free_module_element.c:11182) return M(self.list(), coerce=True) File "sage/structure/parent.pyx", line 902, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9225) return mor._call_with_args(x, args, kwds) File "sage/structure/coerce_maps.pyx", line 171, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_with_args (build/cythonized/sage/structure/coerce_maps.c:4872) return C._element_constructor(x, **kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/modules/free_module.py", line 5601, in _element_constructor_ return FreeModule_generic_field._element_constructor_(self, e, *args, **kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/modules/free_module.py", line 1028, in _element_constructor_ return self.element_class(self, x, coerce, copy) File "sage/modules/free_module_element.pyx", line 4119, in sage.modules.free_module_element.FreeModuleElement_generic_dense.__init__ (build/cythonized/sage/modules/free_module_element.c:28564) entries = [coefficient_ring(x) for x in entries] File "sage/structure/parent.pyx", line 900, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9198) return mor._call_(x) File "sage/structure/coerce_maps.pyx", line 157, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (build/cythonized/sage/structure/coerce_maps.c:4449) return C._element_constructor(x) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/qqbar.py", line 759, in _element_constructor_ return x._algebraic_(AA) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/universal_cyclotomic_field.py", line 606, in _algebraic_ return R(QQbar(self)) File "sage/structure/parent.pyx", line 900, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9198) return mor._call_(x) File "sage/categories/map.pyx", line 789, in sage.categories.map.Map._call_ (build/cythonized/sage/categories/map.c:6953) cpdef Element _call_(self, x): File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/universal_cyclotomic_field.py", line 260, in _call_ zeta = QQbar.zeta(k) File "sage/misc/cachefunc.pyx", line 1949, in sage.misc.cachefunc.CachedMethodCaller.__call__ (build/cythonized/sage/misc/cachefunc.c:10274) w = self._instance_call(*args, **kwds) File "sage/misc/cachefunc.pyx", line 1825, in sage.misc.cachefunc.CachedMethodCaller._instance_call (build/cythonized/sage/misc/cachefunc.c:9759) return self.f(self._instance, *args, **kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/qqbar.py", line 1366, in zeta nf = CyclotomicField(n) File "sage/structure/factory.pyx", line 369, in sage.structure.factory.UniqueFactory.__call__ (build/cythonized/sage/structure/factory.c:2146) return self.get_object(version, key, kwds) File "sage/structure/factory.pyx", line 406, in sage.structure.factory.UniqueFactory.get_object (build/cythonized/sage/structure/factory.c:2350) return self._cache[version, cache_key] File "sage/misc/weak_dict.pyx", line 704, in sage.misc.weak_dict.WeakValueDictionary.__getitem__ (build/cythonized/sage/misc/weak_dict.c:3653) cdef PyObject* wr = PyDict_GetItemWithError(self, k) File "sage/cpython/dict_del_by_value.pyx", line 58, in sage.cpython.dict_del_by_value.PyDict_GetItemWithError (build/cythonized/sage/cpython/dict_del_by_value.c:1261) ep = mp.ma_lookup(mp, <PyObject*><void*>key, PyObject_Hash(key)) File "sage/rings/real_lazy.pyx", line 1398, in sage.rings.real_lazy.LazyNamedUnop.__hash__ (build/cythonized/sage/rings/real_lazy.c:15533) return hash(complex(self)) File "sage/rings/real_lazy.pyx", line 822, in sage.rings.real_lazy.LazyFieldElement.__complex__ (build/cythonized/sage/rings/real_lazy.c:9872) return self.eval(complex) File "sage/rings/real_lazy.pyx", line 1352, in sage.rings.real_lazy.LazyNamedUnop.eval (build/cythonized/sage/rings/real_lazy.c:14982) arg = self._arg.eval(R) File "sage/rings/real_lazy.pyx", line 1129, in sage.rings.real_lazy.LazyBinop.eval (build/cythonized/sage/rings/real_lazy.c:12722) left = self._left.eval(R) File "sage/rings/real_lazy.pyx", line 1130, in sage.rings.real_lazy.LazyBinop.eval (build/cythonized/sage/rings/real_lazy.c:12734) right = self._right.eval(R) File "sage/rings/real_lazy.pyx", line 1647, in sage.rings.real_lazy.LazyAlgebraic.eval (build/cythonized/sage/rings/real_lazy.c:17915) self.eval(self.parent().interval_field(64)) # up the prec File "sage/rings/real_lazy.pyx", line 1673, in sage.rings.real_lazy.LazyAlgebraic.eval (build/cythonized/sage/rings/real_lazy.c:18066) roots = self._poly.roots(ring = AA if isinstance(self._parent, RealLazyField_class) else QQbar) File "sage/rings/polynomial/polynomial_element.pyx", line 7721, in sage.rings.polynomial.polynomial_element.Polynomial.roots (build/cythonized/sage/rings/polynomial/polynomial_element.c:61844) rts = complex_roots(self, retval='algebraic') File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/polynomial/complex_roots.py", line 258, in complex_roots rts = cfac.roots(multiplicities=False) File "sage/rings/polynomial/polynomial_element.pyx", line 7629, in sage.rings.polynomial.polynomial_element.Polynomial.roots (build/cythonized/sage/rings/polynomial/polynomial_element.c:59297) ext_rts = self.__pari__().polroots(precision=L.prec()) File "sage/rings/polynomial/polynomial_element.pyx", line 6021, in sage.rings.polynomial.polynomial_element.Polynomial.__pari__ (build/cythonized/sage/rings/polynomial/polynomial_element.c:49003) return self._pari_with_name(self._parent.variable_name()) File "sage/rings/polynomial/polynomial_element.pyx", line 6074, in sage.rings.polynomial.polynomial_element.Polynomial._pari_with_name (build/cythonized/sage/rings/polynomial/polynomial_element.c:49395) vals = [x.__pari__() for x in self.list()] File "sage/rings/complex_number.pyx", line 593, in sage.rings.complex_number.ComplexNumber.__pari__ (build/cythonized/sage/rings/complex_number.c:7107) return self.real().__pari__() File "sage/rings/real_mpfr.pyx", line 3248, in sage.rings.real_mpfr.RealNumber.__pari__ (build/cythonized/sage/rings/real_mpfr.c:22355) sig_on()
Here it's probably building the reference docs for sage.categories.finite_coxeter_groups
which contains a plot using CoxeterGroup.permutahedron()
, which in turn much further down happens to use PARI to compute some complex roots of a polynomial.
It looks like it's not actually reaching the polroots()
call before crashing somewhere down in converting the polynomial coeffs to PARI values...?
comment:33 Changed 4 years ago by
The nffactor
function in PARI ("Factorization of the univariate polynomial (or rational function) T over the number field nf") internally uses a parallel Chinese remainder routine in the internal function polint_chinese
. This results in a thread pool being launched. This is being called at some point during the categories reference doc build.
comment:34 Changed 4 years ago by
If #26002 has any effect, the pool should have just a single thread, no?
comment:35 Changed 4 years ago by
In further digging, yes, I just found #26002 and that indeed pari has pari_mt_nbthreads = 1
so maybe I'm barking up the wrong tree. The polint_chinese
call indeed does not use any threads in this case.
comment:37 follow-up: 39 Changed 4 years ago by
On a wild guess, I tried switching the docbuild to use my multiprocessing.Pool
replacement from #27490 (this is on Linux), and the crashes no longer occur.
The major difference is that in multiprocessing.Pool
, each docbuild subprocess is forked from a separate thread from the main thread, whereas in my replacement they're just forked directly from the main thread.
Somehow this alone is enough to leave some structures in PARI in a bad state, and only if it was built with multithreading support in the first place (apparently).
comment:38 follow-up: 40 Changed 4 years ago by
Going back to comment 30, if --mt=pthread
then a variable in PARI's Configure
, enable_tls="yes"
is set. This in turns leads to defining a macro in paricfg.h
called ENABLE_TLS
.
The only effect of ENABLE_TLS
is in src/headers/parisys.h
:
#ifdef ENABLE_TLS # define THREAD __thread #else # define THREAD #endif
So variables in PARI declared THREAD
use TLS in this case. This alone could be enough to suspect strange behavior in PARI when it's forked from a thread...
comment:39 Changed 4 years ago by
Replying to embray:
On a wild guess, I tried switching the docbuild to use my
multiprocessing.Pool
replacement from #27490 (this is on Linux), and the crashes no longer occur.
This is matching what I observed on #28242 (and the workaround I added to that branch) -- so we're in the same wilderness :-)
comment:40 Changed 4 years ago by
Replying to embray:
So variables in PARI declared
THREAD
use TLS in this case. This alone could be enough to suspect strange behavior in PARI when it's forked from a thread...
Yup. That's really all it is. PARI has tons of global variables declared __thread
, including but not least of all pari_mainstack
. When multiprocessing.Pool
spins up a new thread, all of those __thread
variables are suddenly set back to their initialization values (0x0, typically).
This isn't so unusual in PARI's case. It assumes that it's the only one that will be starting new threads that it manages. It does not assume it will ever be used in someone else's multi-threaded application.
comment:41 Changed 3 years ago by
#28356 proposes a workaround for this issue.
It won't solve the issue in general (PARI is not safe to use in arbitrary multi-threaded code), but at least it won't crash when building the docs.
comment:42 follow-ups: 43 44 Changed 3 years ago by
Did somebody check that this problem is limited to docbuilds? Does the Sage testsuite pass (apart from doc-related tests of course)?
comment:43 Changed 3 years ago by
Replying to jdemeyer:
Did somebody check that this problem is limited to docbuilds? Does the Sage testsuite pass (apart from doc-related tests of course)?
Yes, there are no test suite failures related to this.
comment:44 Changed 3 years ago by
Replying to jdemeyer:
Did somebody check that this problem is limited to docbuilds? Does the Sage testsuite pass (apart from doc-related tests of course)?
In principle it's not just limited to docbuilds: The broader problem, for which I would like to find a better resolution, is that the cypari2 Pari
instances are simply not thread-safe. A more general approach would be if Sage's single Pari
instance were actually thread-local.
As it is, Sage doesn't use threads for much of anything, whether in the tests, or in general, so the problem arises primarily in the docbuild.
But this can cause problems for anyone who carelessly tries to use multiprocessing.Pool
, or threads in general, in Sage* :(
!* If they do anything in those threads that happens to use PARI.
comment:45 Changed 3 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
I think there's a little bit of misinformation / misconception here.
There's nothing about Sage's docbuild program that uses multi-threading. It uses a process pool and builds each sub-document in separate processes.
(There are some cases where it does not run builds in subprocesses when it probably should, and I think that is contributing somewhat to the explosion of memory usage in the docbuild, but that's a separate issue).