Profiling#
Sometimes computing the likelihood is not as fast as we would like. Theano provides handy profiling tools which are wrapped in PyMC3 by model.profile
. This function returns a ProfileStats
object conveying information about the underlying Theano operations. Here we’ll profile the likelihood and gradient for the stochastic volatility example.
First we build the model.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Running on PyMC3 v5.10.0
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
# Load the data
returns = pd.read_csv(pm.get_data("SP500.csv"), index_col=0, parse_dates=True)
# Stochastic volatility example
with pm.Model() as model:
sigma = pm.Exponential("sigma", 1.0 / 0.02, initval=0.1)
nu = pm.Exponential("nu", 1.0 / 10)
s = pm.GaussianRandomWalk("s", sigma**-2, shape=returns.shape[0])
r = pm.StudentT("r", nu, lam=np.exp(-2 * s), observed=returns["change"])
C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\distributions\timeseries.py:293: UserWarning: Initial distribution not specified, defaulting to `Normal.dist(0, 100)`.You can specify an init_dist manually to suppress this warning.
warnings.warn(
Then we call the profile
function and summarize its return values.
# Profiling of the logp call
model.profile(model.logp()).summary()
Function profiling
==================
Message: C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:990
Time in 1000 calls to Function.__call__: 1.331489e-01s
Time in Function.vm.__call__: 0.10964459989918396s (82.347%)
Time in thunks: 0.1032404899597168s (77.538%)
Total compilation time: 9.425617e-01s
Number of Apply nodes: 26
PyTensor rewrite time: 8.539493e-01s
PyTensor validate time: 2.782300e-03s
PyTensor Linker time (includes C, CUDA code generation/compiling): 0.07904159999452531s
Import time 3.465450e-02s
Node make_thunk time 7.485420e-02s
Node Composite{...}(s, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0, Composite{...}.0, Composite{...}.1, Composite{gammaln((0.5 * i0))}.0) time 4.614200e-03s
Node Composite{...}(sigma_log__, 50.00000111758712, 0.0, 3.912023027779888, -inf) time 4.260600e-03s
Node Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0) time 4.128100e-03s
Node MakeVector{dtype='bool'}(All{axes=None}.0, Composite{...}.1) time 4.086700e-03s
Node Composite{...}(nu_log__, 0, 0.1, 0.0, -2.302585092994046, -inf) time 3.909600e-03s
Time in all call to pytensor.grad() 0.000000e+00s
Time since pytensor import 4.949s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
86.3% 86.3% 0.089s 8.10e-06s C 11000 11 pytensor.tensor.elemwise.Elemwise
7.7% 94.0% 0.008s 3.98e-06s C 2000 2 pytensor.tensor.math.Sum
2.9% 96.9% 0.003s 7.51e-07s C 4000 4 pytensor.tensor.elemwise.DimShuffle
2.1% 99.0% 0.002s 1.06e-06s C 2000 2 pytensor.tensor.basic.Join
1.0% 99.9% 0.001s 3.28e-07s C 3000 3 pytensor.tensor.subtensor.Subtensor
0.1% 100.0% 0.000s 4.35e-08s C 2000 2 pytensor.tensor.math.All
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 pytensor.tensor.basic.Split
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 pytensor.tensor.basic.MakeVector
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
77.4% 77.4% 0.080s 7.99e-05s C 1000 1 Composite{...}
7.7% 85.1% 0.008s 3.98e-06s C 2000 2 Sum{axes=None}
3.9% 89.0% 0.004s 4.00e-06s C 1000 1 Composite{((-0.5 * sqr((i0 - i1))) - 0.9189385332046727)}
3.1% 92.1% 0.003s 3.17e-06s C 1000 1 Sub
2.9% 95.0% 0.003s 7.51e-07s C 4000 4 ExpandDims{axis=0}
2.1% 97.0% 0.002s 1.06e-06s C 2000 2 Join
1.9% 99.0% 0.002s 1.98e-06s C 1000 1 Switch
1.0% 99.9% 0.001s 9.83e-07s C 1000 1 Subtensor{i}
0.1% 100.0% 0.000s 4.35e-08s C 2000 2 All{axes=None}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{...}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Subtensor{:stop}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Subtensor{start:}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{...}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{gammaln((0.5 * i0))}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{...}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{reciprocal(sqr(i0))}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Split{2}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{((-0.5 * sqr((0.01 * i0))) - 5.524108719192764)}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 MakeVector{dtype='bool'}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Add
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
77.4% 77.4% 0.080s 7.99e-05s 1000 13 Composite{...}(s, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0, Composite{...}.0, Composite{...}.1, Composite{gammaln((0.5 * i0))}.0)
4.7% 82.2% 0.005s 4.89e-06s 1000 24 Sum{axes=None}(lam > 0, nu > 0)
3.9% 86.0% 0.004s 4.00e-06s 1000 17 Composite{((-0.5 * sqr((i0 - i1))) - 0.9189385332046727)}(Split{2}.1, Composite{reciprocal(sqr(i0))}.0)
3.1% 89.1% 0.003s 3.17e-06s 1000 6 Sub(Subtensor{start:}.0, Subtensor{:stop}.0)
3.0% 92.1% 0.003s 3.08e-06s 1000 21 Sum{axes=None}(s_logprob)
2.0% 94.0% 0.002s 2.02e-06s 1000 19 Join(0, sigma > 0, sigma > 0)
1.9% 96.0% 0.002s 2.00e-06s 1000 22 ExpandDims{axis=0}(All{axes=None}.0)
1.9% 97.9% 0.002s 1.98e-06s 1000 23 Switch(ExpandDims{axis=0}.0, Composite{...}.1, [-inf])
1.0% 98.9% 0.001s 1.00e-06s 1000 8 ExpandDims{axis=0}(Composite{...}.0)
1.0% 99.8% 0.001s 9.83e-07s 1000 3 Subtensor{i}(s, 0)
0.1% 99.9% 0.000s 1.07e-07s 1000 11 Join(0, ExpandDims{axis=0}.0, Sub.0)
0.1% 100.0% 0.000s 8.70e-08s 1000 15 All{axes=None}(Composite{...}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 25 Add(mu >= 0, sigma_log__, mu >= 0, nu_log__, s_logprob, Sum{axes=None}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 20 All{axes=None}(MakeVector{dtype='bool'}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 18 MakeVector{dtype='bool'}(All{axes=None}.0, Composite{...}.1)
0.0% 100.0% 0.000s 0.00e+00s 1000 16 Composite{((-0.5 * sqr((0.01 * i0))) - 5.524108719192764)}(Split{2}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 14 Split{2}(Join.0, 0, [ 1 2905])
0.0% 100.0% 0.000s 0.00e+00s 1000 12 Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 10 Composite{...}(ExpandDims{axis=0}.0)
0.0% 100.0% 0.000s 0.00e+00s 1000 9 Composite{gammaln((0.5 * i0))}(ExpandDims{axis=0}.0)
... (remaining 6 Apply instances account for 0.00%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
- Try the PyTensor flag floatX=float32
- Try installing amdlibm and set the PyTensor flag lib__amblibm=True. This speeds up only some Elemwise operation.
# Profiling of the gradient call dlogp/dx
model.profile(pm.gradient(model.logp(), vars=None)).summary()
Function profiling
==================
Message: C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:990
Time in 1000 calls to Function.__call__: 1.622821e-01s
Time in Function.vm.__call__: 0.13946900120936334s (85.942%)
Time in thunks: 0.13289427757263184s (81.891%)
Total compilation time: 9.082990e-01s
Number of Apply nodes: 34
PyTensor rewrite time: 8.388561e-01s
PyTensor validate time: 8.517501e-03s
PyTensor Linker time (includes C, CUDA code generation/compiling): 0.059757000009994954s
Import time 2.345270e-02s
Node make_thunk time 5.872960e-02s
Node Composite{...}(s, ExpandDims{axis=0}.0, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0) time 4.175900e-03s
Node Composite{...}(nu_log__, 1.0, 0.5, -1453.0, 1453.0, -0.1, 0.0, 0) time 4.137400e-03s
Node Composite{(i4 + i5 + (i0 * i1 * i2 * i3))}(-2.0, Sum{axes=None}.0, Composite{...}.1, Composite{...}.0, Composite{...}.2, 1.0) time 3.595000e-03s
Node Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0) time 3.540400e-03s
Node Composite{...}(sigma_log__, -50.00000111758712, 0.0, 0) time 3.315400e-03s
Time in all call to pytensor.grad() 2.053549e-01s
Time since pytensor import 6.475s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
72.1% 72.1% 0.096s 9.58e-06s C 10000 10 pytensor.tensor.elemwise.Elemwise
6.8% 78.9% 0.009s 1.81e-06s C 5000 5 pytensor.tensor.elemwise.DimShuffle
6.8% 85.7% 0.009s 3.00e-06s C 3000 3 pytensor.tensor.math.Sum
6.0% 91.7% 0.008s 2.67e-06s C 3000 3 pytensor.tensor.basic.Join
3.8% 95.5% 0.005s 1.67e-06s C 3000 3 pytensor.tensor.subtensor.IncSubtensor
2.3% 97.7% 0.003s 1.50e-06s C 2000 2 pytensor.tensor.basic.Split
1.5% 99.2% 0.002s 1.01e-06s C 2000 2 pytensor.tensor.shape.Reshape
0.8% 100.0% 0.001s 3.34e-07s C 3000 3 pytensor.tensor.subtensor.Subtensor
0.0% 100.0% 0.000s 0.00e+00s C 3000 3 pytensor.tensor.shape.SpecifyShape
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
65.3% 65.3% 0.087s 8.68e-05s C 1000 1 Composite{...}
6.8% 72.1% 0.009s 3.00e-06s C 3000 3 Sum{axes=None}
6.0% 78.1% 0.008s 2.67e-06s C 3000 3 Join
3.8% 81.9% 0.005s 1.26e-06s C 4000 4 ExpandDims{axis=0}
3.0% 84.9% 0.004s 4.00e-06s C 1000 1 DropDims{axis=0}
3.0% 87.9% 0.004s 4.00e-06s C 1000 1 Composite{...}
2.3% 90.2% 0.003s 1.50e-06s C 2000 2 Split{2}
1.5% 91.7% 0.002s 1.01e-06s C 2000 2 Reshape{1}
1.5% 93.2% 0.002s 2.00e-06s C 1000 1 IncSubtensor{i}
1.5% 94.7% 0.002s 2.00e-06s C 1000 1 Sub
1.5% 96.2% 0.002s 2.00e-06s C 1000 1 IncSubtensor{:stop}
1.5% 97.7% 0.002s 2.00e-06s C 1000 1 Composite{...}
0.8% 98.5% 0.001s 1.00e-06s C 1000 1 Subtensor{:stop}
0.8% 99.2% 0.001s 1.00e-06s C 1000 1 IncSubtensor{start:}
0.8% 100.0% 0.001s 1.00e-06s C 1000 1 Composite{(i4 + i5 + i6 + i7 + (i0 * i1 * i2 * i3) + i8 + i9)}
0.0% 100.0% 0.000s 0.00e+00s C 3000 3 SpecifyShape
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{...}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Subtensor{start:}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Subtensor{i}
0.0% 100.0% 0.000s 0.00e+00s C 1000 1 Composite{reciprocal(sqr(i0))}
... (remaining 3 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
65.3% 65.3% 0.087s 8.68e-05s 1000 12 Composite{...}(s, ExpandDims{axis=0}.0, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0)
3.8% 69.1% 0.005s 5.00e-06s 1000 33 Join(0, SpecifyShape.0, SpecifyShape.0, (d__logp/ds))
3.0% 72.1% 0.004s 4.00e-06s 1000 28 DropDims{axis=0}(SpecifyShape.0)
3.0% 75.1% 0.004s 4.00e-06s 1000 15 Sum{axes=None}(Composite{...}.2)
3.0% 78.1% 0.004s 4.00e-06s 1000 16 Composite{...}(Split{2}.1, Composite{reciprocal(sqr(i0))}.0)
2.3% 80.4% 0.003s 3.03e-06s 1000 9 ExpandDims{axis=0}(Composite{...}.1)
2.3% 82.7% 0.003s 3.00e-06s 1000 13 Split{2}(Join.0, 0, [ 1 2905])
2.3% 84.9% 0.003s 3.00e-06s 1000 14 Sum{axes=None}(Composite{...}.1)
1.5% 86.5% 0.002s 2.03e-06s 1000 25 Reshape{1}((d__logp/dsigma_log__), [-1])
1.5% 88.0% 0.002s 2.00e-06s 1000 29 IncSubtensor{i}(Composite{...}.0, DropDims{axis=0}.0, 0)
1.5% 89.5% 0.002s 2.00e-06s 1000 32 IncSubtensor{:stop}(IncSubtensor{start:}.0, Neg.0, -1)
1.5% 91.0% 0.002s 2.00e-06s 1000 6 Sub(Subtensor{start:}.0, Subtensor{:stop}.0)
1.5% 92.5% 0.002s 2.00e-06s 1000 11 Join(0, ExpandDims{axis=0}.0, Sub.0)
1.5% 94.0% 0.002s 2.00e-06s 1000 19 Sum{axes=None}(Composite{...}.0)
1.5% 95.5% 0.002s 2.00e-06s 1000 4 Composite{...}(nu_log__, 1.0, 0.5, -1453.0, 1453.0, -0.1, 0.0, 0)
0.8% 96.2% 0.001s 1.00e-06s 1000 1 Subtensor{:stop}(s, -1)
0.8% 97.0% 0.001s 1.00e-06s 1000 30 IncSubtensor{start:}(IncSubtensor{i}.0, Split{2}.1, 1)
0.8% 97.7% 0.001s 1.00e-06s 1000 21 Join(0, Mul.0, Composite{...}.1)
0.8% 98.5% 0.001s 1.00e-06s 1000 17 Composite{(i4 + i5 + i6 + i7 + (i0 * i1 * i2 * i3) + i8 + i9)}(-1.0, 0.5, Sum{axes=None}.0, Composite{...}.1, Composite{...}.4, Composite{...}.3, -1452.9999999999736, Composite{...}.2, Sum{axes=None}.0, 1.0)
0.8% 99.2% 0.001s 1.00e-06s 1000 7 ExpandDims{axis=0}(Subtensor{i}.0)
... (remaining 14 Apply instances account for 0.75%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
- Try the PyTensor flag floatX=float32
- Try installing amdlibm and set the PyTensor flag lib__amblibm=True. This speeds up only some Elemwise operation.
%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Mon Dec 04 2023
Python implementation: CPython
Python version : 3.11.6
IPython version : 8.15.0
pandas: 2.1.3
pymc : 5.10.0
numpy : 1.26.2
Watermark: 2.4.3