Profiling#

Sometimes computing the likelihood is not as fast as we would like. Theano provides handy profiling tools which are wrapped in PyMC3 by model.profile. This function returns a ProfileStats object conveying information about the underlying Theano operations. Here we’ll profile the likelihood and gradient for the stochastic volatility example.

First we build the model.

import numpy as np
import pandas as pd
import pymc as pm

print(f"Running on PyMC3 v{pm.__version__}")
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Running on PyMC3 v5.10.0
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
# Load the data
returns = pd.read_csv(pm.get_data("SP500.csv"), index_col=0, parse_dates=True)
# Stochastic volatility example
with pm.Model() as model:
    sigma = pm.Exponential("sigma", 1.0 / 0.02, initval=0.1)
    nu = pm.Exponential("nu", 1.0 / 10)
    s = pm.GaussianRandomWalk("s", sigma**-2, shape=returns.shape[0])
    r = pm.StudentT("r", nu, lam=np.exp(-2 * s), observed=returns["change"])
C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\distributions\timeseries.py:293: UserWarning: Initial distribution not specified, defaulting to `Normal.dist(0, 100)`.You can specify an init_dist manually to suppress this warning.
  warnings.warn(

Then we call the profile function and summarize its return values.

# Profiling of the logp call
model.profile(model.logp()).summary()
Function profiling
==================
  Message: C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:990
  Time in 1000 calls to Function.__call__: 1.331489e-01s
  Time in Function.vm.__call__: 0.10964459989918396s (82.347%)
  Time in thunks: 0.1032404899597168s (77.538%)
  Total compilation time: 9.425617e-01s
    Number of Apply nodes: 26
    PyTensor rewrite time: 8.539493e-01s
       PyTensor validate time: 2.782300e-03s
    PyTensor Linker time (includes C, CUDA code generation/compiling): 0.07904159999452531s
       Import time 3.465450e-02s
       Node make_thunk time 7.485420e-02s
           Node Composite{...}(s, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0, Composite{...}.0, Composite{...}.1, Composite{gammaln((0.5 * i0))}.0) time 4.614200e-03s
           Node Composite{...}(sigma_log__, 50.00000111758712, 0.0, 3.912023027779888, -inf) time 4.260600e-03s
           Node Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0) time 4.128100e-03s
           Node MakeVector{dtype='bool'}(All{axes=None}.0, Composite{...}.1) time 4.086700e-03s
           Node Composite{...}(nu_log__, 0, 0.1, 0.0, -2.302585092994046, -inf) time 3.909600e-03s

Time in all call to pytensor.grad() 0.000000e+00s
Time since pytensor import 4.949s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  86.3%    86.3%       0.089s       8.10e-06s     C    11000      11   pytensor.tensor.elemwise.Elemwise
   7.7%    94.0%       0.008s       3.98e-06s     C     2000       2   pytensor.tensor.math.Sum
   2.9%    96.9%       0.003s       7.51e-07s     C     4000       4   pytensor.tensor.elemwise.DimShuffle
   2.1%    99.0%       0.002s       1.06e-06s     C     2000       2   pytensor.tensor.basic.Join
   1.0%    99.9%       0.001s       3.28e-07s     C     3000       3   pytensor.tensor.subtensor.Subtensor
   0.1%   100.0%       0.000s       4.35e-08s     C     2000       2   pytensor.tensor.math.All
   0.0%   100.0%       0.000s       0.00e+00s     C     1000       1   pytensor.tensor.basic.Split
   0.0%   100.0%       0.000s       0.00e+00s     C     1000       1   pytensor.tensor.basic.MakeVector
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  77.4%    77.4%       0.080s       7.99e-05s     C     1000        1   Composite{...}
   7.7%    85.1%       0.008s       3.98e-06s     C     2000        2   Sum{axes=None}
   3.9%    89.0%       0.004s       4.00e-06s     C     1000        1   Composite{((-0.5 * sqr((i0 - i1))) - 0.9189385332046727)}
   3.1%    92.1%       0.003s       3.17e-06s     C     1000        1   Sub
   2.9%    95.0%       0.003s       7.51e-07s     C     4000        4   ExpandDims{axis=0}
   2.1%    97.0%       0.002s       1.06e-06s     C     2000        2   Join
   1.9%    99.0%       0.002s       1.98e-06s     C     1000        1   Switch
   1.0%    99.9%       0.001s       9.83e-07s     C     1000        1   Subtensor{i}
   0.1%   100.0%       0.000s       4.35e-08s     C     2000        2   All{axes=None}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{:stop}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{start:}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{gammaln((0.5 * i0))}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{reciprocal(sqr(i0))}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Split{2}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{((-0.5 * sqr((0.01 * i0))) - 5.524108719192764)}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   MakeVector{dtype='bool'}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Add
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  77.4%    77.4%       0.080s       7.99e-05s   1000    13   Composite{...}(s, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0, Composite{...}.0, Composite{...}.1, Composite{gammaln((0.5 * i0))}.0)
   4.7%    82.2%       0.005s       4.89e-06s   1000    24   Sum{axes=None}(lam > 0, nu > 0)
   3.9%    86.0%       0.004s       4.00e-06s   1000    17   Composite{((-0.5 * sqr((i0 - i1))) - 0.9189385332046727)}(Split{2}.1, Composite{reciprocal(sqr(i0))}.0)
   3.1%    89.1%       0.003s       3.17e-06s   1000     6   Sub(Subtensor{start:}.0, Subtensor{:stop}.0)
   3.0%    92.1%       0.003s       3.08e-06s   1000    21   Sum{axes=None}(s_logprob)
   2.0%    94.0%       0.002s       2.02e-06s   1000    19   Join(0, sigma > 0, sigma > 0)
   1.9%    96.0%       0.002s       2.00e-06s   1000    22   ExpandDims{axis=0}(All{axes=None}.0)
   1.9%    97.9%       0.002s       1.98e-06s   1000    23   Switch(ExpandDims{axis=0}.0, Composite{...}.1, [-inf])
   1.0%    98.9%       0.001s       1.00e-06s   1000     8   ExpandDims{axis=0}(Composite{...}.0)
   1.0%    99.8%       0.001s       9.83e-07s   1000     3   Subtensor{i}(s, 0)
   0.1%    99.9%       0.000s       1.07e-07s   1000    11   Join(0, ExpandDims{axis=0}.0, Sub.0)
   0.1%   100.0%       0.000s       8.70e-08s   1000    15   All{axes=None}(Composite{...}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    25   Add(mu >= 0, sigma_log__, mu >= 0, nu_log__, s_logprob, Sum{axes=None}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    20   All{axes=None}(MakeVector{dtype='bool'}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    18   MakeVector{dtype='bool'}(All{axes=None}.0, Composite{...}.1)
   0.0%   100.0%       0.000s       0.00e+00s   1000    16   Composite{((-0.5 * sqr((0.01 * i0))) - 5.524108719192764)}(Split{2}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    14   Split{2}(Join.0, 0, [   1 2905])
   0.0%   100.0%       0.000s       0.00e+00s   1000    12   Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000    10   Composite{...}(ExpandDims{axis=0}.0)
   0.0%   100.0%       0.000s       0.00e+00s   1000     9   Composite{gammaln((0.5 * i0))}(ExpandDims{axis=0}.0)
   ... (remaining 6 Apply instances account for 0.00%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the PyTensor flag floatX=float32
  - Try installing amdlibm and set the PyTensor flag lib__amblibm=True. This speeds up only some Elemwise operation.
# Profiling of the gradient call dlogp/dx
model.profile(pm.gradient(model.logp(), vars=None)).summary()
Function profiling
==================
  Message: C:\Users\Divya Tiwari\anaconda3\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:990
  Time in 1000 calls to Function.__call__: 1.622821e-01s
  Time in Function.vm.__call__: 0.13946900120936334s (85.942%)
  Time in thunks: 0.13289427757263184s (81.891%)
  Total compilation time: 9.082990e-01s
    Number of Apply nodes: 34
    PyTensor rewrite time: 8.388561e-01s
       PyTensor validate time: 8.517501e-03s
    PyTensor Linker time (includes C, CUDA code generation/compiling): 0.059757000009994954s
       Import time 2.345270e-02s
       Node make_thunk time 5.872960e-02s
           Node Composite{...}(s, ExpandDims{axis=0}.0, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0) time 4.175900e-03s
           Node Composite{...}(nu_log__, 1.0, 0.5, -1453.0, 1453.0, -0.1, 0.0, 0) time 4.137400e-03s
           Node Composite{(i4 + i5 + (i0 * i1 * i2 * i3))}(-2.0, Sum{axes=None}.0, Composite{...}.1, Composite{...}.0, Composite{...}.2, 1.0) time 3.595000e-03s
           Node Composite{reciprocal(sqr(i0))}(ExpandDims{axis=0}.0) time 3.540400e-03s
           Node Composite{...}(sigma_log__, -50.00000111758712, 0.0, 0) time 3.315400e-03s

Time in all call to pytensor.grad() 2.053549e-01s
Time since pytensor import 6.475s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  72.1%    72.1%       0.096s       9.58e-06s     C    10000      10   pytensor.tensor.elemwise.Elemwise
   6.8%    78.9%       0.009s       1.81e-06s     C     5000       5   pytensor.tensor.elemwise.DimShuffle
   6.8%    85.7%       0.009s       3.00e-06s     C     3000       3   pytensor.tensor.math.Sum
   6.0%    91.7%       0.008s       2.67e-06s     C     3000       3   pytensor.tensor.basic.Join
   3.8%    95.5%       0.005s       1.67e-06s     C     3000       3   pytensor.tensor.subtensor.IncSubtensor
   2.3%    97.7%       0.003s       1.50e-06s     C     2000       2   pytensor.tensor.basic.Split
   1.5%    99.2%       0.002s       1.01e-06s     C     2000       2   pytensor.tensor.shape.Reshape
   0.8%   100.0%       0.001s       3.34e-07s     C     3000       3   pytensor.tensor.subtensor.Subtensor
   0.0%   100.0%       0.000s       0.00e+00s     C     3000       3   pytensor.tensor.shape.SpecifyShape
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  65.3%    65.3%       0.087s       8.68e-05s     C     1000        1   Composite{...}
   6.8%    72.1%       0.009s       3.00e-06s     C     3000        3   Sum{axes=None}
   6.0%    78.1%       0.008s       2.67e-06s     C     3000        3   Join
   3.8%    81.9%       0.005s       1.26e-06s     C     4000        4   ExpandDims{axis=0}
   3.0%    84.9%       0.004s       4.00e-06s     C     1000        1   DropDims{axis=0}
   3.0%    87.9%       0.004s       4.00e-06s     C     1000        1   Composite{...}
   2.3%    90.2%       0.003s       1.50e-06s     C     2000        2   Split{2}
   1.5%    91.7%       0.002s       1.01e-06s     C     2000        2   Reshape{1}
   1.5%    93.2%       0.002s       2.00e-06s     C     1000        1   IncSubtensor{i}
   1.5%    94.7%       0.002s       2.00e-06s     C     1000        1   Sub
   1.5%    96.2%       0.002s       2.00e-06s     C     1000        1   IncSubtensor{:stop}
   1.5%    97.7%       0.002s       2.00e-06s     C     1000        1   Composite{...}
   0.8%    98.5%       0.001s       1.00e-06s     C     1000        1   Subtensor{:stop}
   0.8%    99.2%       0.001s       1.00e-06s     C     1000        1   IncSubtensor{start:}
   0.8%   100.0%       0.001s       1.00e-06s     C     1000        1   Composite{(i4 + i5 + i6 + i7 + (i0 * i1 * i2 * i3) + i8 + i9)}
   0.0%   100.0%       0.000s       0.00e+00s     C     3000        3   SpecifyShape
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{...}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{start:}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Subtensor{i}
   0.0%   100.0%       0.000s       0.00e+00s     C     1000        1   Composite{reciprocal(sqr(i0))}
   ... (remaining 3 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  65.3%    65.3%       0.087s       8.68e-05s   1000    12   Composite{...}(s, ExpandDims{axis=0}.0, [1.0435322 ... 54666e-07], ExpandDims{axis=0}.0)
   3.8%    69.1%       0.005s       5.00e-06s   1000    33   Join(0, SpecifyShape.0, SpecifyShape.0, (d__logp/ds))
   3.0%    72.1%       0.004s       4.00e-06s   1000    28   DropDims{axis=0}(SpecifyShape.0)
   3.0%    75.1%       0.004s       4.00e-06s   1000    15   Sum{axes=None}(Composite{...}.2)
   3.0%    78.1%       0.004s       4.00e-06s   1000    16   Composite{...}(Split{2}.1, Composite{reciprocal(sqr(i0))}.0)
   2.3%    80.4%       0.003s       3.03e-06s   1000     9   ExpandDims{axis=0}(Composite{...}.1)
   2.3%    82.7%       0.003s       3.00e-06s   1000    13   Split{2}(Join.0, 0, [   1 2905])
   2.3%    84.9%       0.003s       3.00e-06s   1000    14   Sum{axes=None}(Composite{...}.1)
   1.5%    86.5%       0.002s       2.03e-06s   1000    25   Reshape{1}((d__logp/dsigma_log__), [-1])
   1.5%    88.0%       0.002s       2.00e-06s   1000    29   IncSubtensor{i}(Composite{...}.0, DropDims{axis=0}.0, 0)
   1.5%    89.5%       0.002s       2.00e-06s   1000    32   IncSubtensor{:stop}(IncSubtensor{start:}.0, Neg.0, -1)
   1.5%    91.0%       0.002s       2.00e-06s   1000     6   Sub(Subtensor{start:}.0, Subtensor{:stop}.0)
   1.5%    92.5%       0.002s       2.00e-06s   1000    11   Join(0, ExpandDims{axis=0}.0, Sub.0)
   1.5%    94.0%       0.002s       2.00e-06s   1000    19   Sum{axes=None}(Composite{...}.0)
   1.5%    95.5%       0.002s       2.00e-06s   1000     4   Composite{...}(nu_log__, 1.0, 0.5, -1453.0, 1453.0, -0.1, 0.0, 0)
   0.8%    96.2%       0.001s       1.00e-06s   1000     1   Subtensor{:stop}(s, -1)
   0.8%    97.0%       0.001s       1.00e-06s   1000    30   IncSubtensor{start:}(IncSubtensor{i}.0, Split{2}.1, 1)
   0.8%    97.7%       0.001s       1.00e-06s   1000    21   Join(0, Mul.0, Composite{...}.1)
   0.8%    98.5%       0.001s       1.00e-06s   1000    17   Composite{(i4 + i5 + i6 + i7 + (i0 * i1 * i2 * i3) + i8 + i9)}(-1.0, 0.5, Sum{axes=None}.0, Composite{...}.1, Composite{...}.4, Composite{...}.3, -1452.9999999999736, Composite{...}.2, Sum{axes=None}.0, 1.0)
   0.8%    99.2%       0.001s       1.00e-06s   1000     7   ExpandDims{axis=0}(Subtensor{i}.0)
   ... (remaining 14 Apply instances account for 0.75%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the PyTensor flag floatX=float32
  - Try installing amdlibm and set the PyTensor flag lib__amblibm=True. This speeds up only some Elemwise operation.
%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Mon Dec 04 2023

Python implementation: CPython
Python version       : 3.11.6
IPython version      : 8.15.0

pandas: 2.1.3
pymc  : 5.10.0
numpy : 1.26.2

Watermark: 2.4.3