.. _subject-inference:

Subject analysis: Inference at a single coordinate
--------------------------------------------------

A signal model has been fitted to FMRI session data:

.. code:: shell

    fmrifit -v --stimulus-block letter --window 2 1 1 --id 2

    fsl4prune -v --id 2


The Python interface will usually give you some further flexibility.

Create the model
................

Load the FMRI data, the head movements, and the population map from disk:


.. code:: python

    
    from fmristats import *
    
    # the FMRI data
    session = load(path_to_session)
    
    # the fitted head movements (reference maps)
    reference_maps = load(path_to_reference_maps)
    
    # smooth out the movement estimates a little
    reference_maps.mean(2)
    reference_maps.mean(1)
    reference_maps.mean(1)
    
    # load a diffeomorphism from disk
    population_map = load(path_to_population_map)


Create the model: Set the stimulus design in the model, set the
MRI-observations, and set the only(!) hyperparameter of the
MB-estimator, namely the curvature of the spatial weighting kernel.


.. code:: python

    
    model = SignalModel(
        session=session,
        reference_maps=reference_maps,
        population_map=population_map)
    
    model.set_stimulus_design(
            s='letter',
            c='control',
            offset=5.242,
            preset=1.42)
    
    # discard the data of the first four scan cycles
    model.set_data(burn_in=4)
    
    # set the width of the Gaussian kernel that is used for spatial
    # weighting
    model.set_hyperparameters('max')


.. code::

    AGB300-0002-WGT:
                Number of within brain observations:  2,760,279
                Number of    non brain observations: 25,502,121
                Intercept field refers to time:          165.35 s
    

Save the model to disk (if you like):


.. code:: python

    model.save('subject.model')


Query information
.................

Get some information about the FMRI data:


.. code:: python

    print(session.reference.describe())
    print(session.describe())
    print(reference_maps.describe())


.. code::

    
            Resolution (left to right):          3.00 mm
            Resolution (posterior to anterior):  3.00 mm
            Resolution (inferior to superior):   4.60 mm
            Diagonal of one voxel:               6.26 mm
            Volume of one voxel:                41.40 mm^3
            Aspect 0 on 1:                       1.00
            Aspect 0 on 2:                       0.65
            Aspect 1 on 2:                       0.65
    
            EPI code: 3
    
            Scans:        6900
            Valid:        6900
            Outlying:        0 (0.00%)
    
            Scan cycles:   230
            Valid:         230
            Outlying:        0 (0.00%)
    

Get some information about the model:


.. code:: python

    print(model.describe())


.. code::

    
            Hyperparameters:
                Scale type:         max
                Factor:            3.00
    
            Resulting in:
                Mass:            0.9973
                Scale:             2.30 mm
                FWHM:              5.42 mm
                Radius:            6.90 mm
                Diagonal:         13.80 mm
    

Fit the model
.............


.. code:: python

    result = model.fit()


Save the result to disk (if you like):


.. code:: python

    result.save('subject.fit')


Prune the statistics field from non-brain areas. You are expected to do
his manually, as fmristats will usually avoid deleting data:


.. code:: python

    result.mask()


Select the highest peak in the t-field
......................................

Select the highest peak in the t-statistics field:


.. code:: python

    tstatistics = result.get_field('task', 'tstatistic')
    
    index = np.unravel_index(np.nanargmax(tstatistics.data), tstatistics.data.shape)
    coordinate = tstatistics.apply_to_index(index)
    
    print("""
        index in the image :          {}
        coordinate in standard space: {}
        hight of the peak:            {:.2f}""".format(
            index, coordinate, tstatistics.data[index]))


.. code::

    
        index in the image :          (39, 23, 40)
        coordinate in standard space: [ 12. -80.   8.]
        hight of the peak:            7.64
    

Plot the respective horizontal slice that goes through the above peak:


.. code:: python

    from fmristats.plot import picture
    
    picture(tstatistics,3,1,1,[index[-1]],
        interpolation='bilinear', mark_peak=True)


.. image:: figures/subject-inference-at-one-point_figure13_1.png
   :width: 15 cm


Find out the name of standard space, so that you may use a query tool to
find the region to which this coordinate belongs:


.. code:: python

    print(population_map.describe())


.. code::

    
            Population map
            --------------
            diffeomorphism: ants
            vb:             mni152
            nb:             AGB300-0002-WGT-42
            vb_background:  mni152-t1
            nb_background:  --
            vb_estimate:    vb_estimate_ants
            nb_estimate:    nb_estimate_ants
            vb_mask:        --
            nb_mask:        --
            vb_ati:         --
    

Use the ``atlasquery`` tool from the FSL project to see to which
structure these coordinates belong:

.. code:: shell

    atlasquery -a "MNI Structural Atlas" -c 12,-80,8
    MNI Structural Atlas
    40% Occipital Lobe

Create a scatter plot of FMRI data around a coordinate
......................................................

Extract the data at the index:


.. code:: python

    
    data = model.data_at_index(index).reset_index(drop=True)
    print(data.iloc[:,3:].head())


.. code::

       signal        time  task  block  cycle  slice    weight
    0   958.0 -154.664538   0.0    0.0    7.0   11.0  0.020139
    1   986.0 -154.664538   0.0    0.0    7.0   11.0  0.048931
    2  1031.0 -154.664538   0.0    0.0    7.0   11.0  0.021690
    3  1073.0 -154.664538   0.0    0.0    7.0   11.0  0.069630
    4  1108.0 -154.664538   0.0    0.0    7.0   11.0  0.169175
    

Some information about the number of blocks and block labels of the
stimulus design:


.. code:: python

    # Task labels
    print('Task labels:               {}'.format(data.task.unique()))
    
    # Block labels and number of blocks
    print('Block labels:              {}'.format(data.block.unique()))
    print('Number of blocks per task: {}'.format(len(data.block.unique())))
    
    # Scan cycles within task blocks and their total number
    print('Number of scan cycles:     {}'.format(len(data.cycle.unique())))


.. code::

    Task labels:               [0. 1.]
    Block labels:              [0. 1. 2. 3. 4. 5. 6. 7.]
    Number of blocks per task: 8
    Number of scan cycles:     94
    

For visualisation, define a suitable scale for the scatters in relation
to their distance to :math:`x`, and order the MRI-observations by time
of acquisition:


.. code:: python

    
    data.time = data.time - data.time.min()
    elem, idx, count = np.unique(data.time,return_index=True,return_counts=True)
    
    w = data.weight
    w = (w-w.min())/(w.max()-w.min())
    w = .87*np.log(w+1.069)


Create the scatter plot (click the image to enlarge):


.. code:: python

    
    import matplotlib.pylab as pt
    import seaborn as sb
    sb.set_style('whitegrid')
    pt.style.use(['seaborn-whitegrid', 'seaborn-colorblind'])
    
    fig, ax = pt.subplots(figsize=(8,9))
    ax.scatter(data.time, data.signal, s=w, marker='o', c='k')
    ax.set_ylabel('MR Signal')
    ax.set_xlabel('Time (s)')
    sb.despine()
    pt.tight_layout()


.. image:: figures/subject-inference-at-one-point_figure18_1.png
   :width: 15 cm


Fit different models to the data at the coordinate
..................................................

fmristats is very flexible when it comes to defining models for your
FMRI data. In deed, you have the full power of patsy_ at your disposal.

.. _patsy: https://patsy.readthedocs.io/en/latest/

Model 1: fully saturated and nested task/block design
#####################################################

This is a very parsimonious and robust model for FMRI and it is the
default as well as the generally recommended model when using fmristats.


.. code:: python

    
    fit1, model1, _ = model.fit_at_index(index,
        formula='signal~C(task)/C(block, Sum)')
    
    data['y1'] = fit1.predict()
    predict1 = fit1.predict()[idx]


Model 2: linear in time and task
################################

This appears to be a common model. Note, however, that the model is
unable to account for nested effects which typically will increase the
residual mean squared error of the fit, thus, reducing your power.


.. code:: python

    
    fit2, model2, _ = model.fit_at_index(index,
        formula='signal~time+C(task)')
    
    data['y2'] = fit2.predict()
    predict2 = fit2.predict()[idx]


Model 3: b-spline model with 12 degrees of freedom
##################################################

This model is insanely flexible and may be used as an example to show
the effects of over-fitting.


.. code:: python

    
    fit3, model3, _ = model.fit_at_index(index,
        formula='signal~bs(time,df=12)+C(task)')
    
    data['y3'] = fit3.predict()
    predict3 = fit3.predict()[idx]


Plot the fitted regression surfaces
...................................

Plot of the fitted regression surfaces (click image to enlarge):


.. code:: python

    
    current_palette = sb.color_palette()
    c1 = current_palette[0]
    c2 = current_palette[1]
    c3 = current_palette[2]
    
    pt.close()
    fig, ax = pt.subplots()
    ax.plot(elem,predict1,'-',c=c1,lw=.9,label='task/block')
    ax.plot(elem,predict2,'--',c=c2,lw=.9,label='task+time')
    ax.plot(elem,predict3,':',c=c3,lw=.9,label='task+bs(time,df=12)')
    ax.legend()
    ax.set_ylabel('Expected Signal as Predicted by Model')
    ax.set_xlabel('Time (s)')
    sb.despine()
    pt.tight_layout()
    
    #Add block labels:
    positions = data.groupby(['task', 'block']).mean()
    positions = positions[['time','y1','y2','y3']]
    positions.reset_index(inplace=True)
    positions.set_index(['task', 'block', 'time'], inplace=True)
    positions = positions.stack()
    positions.index.names = ['task', 'block', 'time', 'model']
    positions = positions.reset_index()
    positions.set_index(['task', 'block', 'model'], inplace=True)
    positions.rename(columns={0:'value'}, inplace=True)
    positions.head(10)
    
    positions = positions.reset_index()
    positions = positions.groupby(['task','block','time']).max()
    positions.reset_index(inplace=True)
    positions.head(10)
    
    a = fit1.params[0]
    b = fit1.params[1] + a
    
    for r in positions.itertuples():
        if np.isclose(r.task,0):
            label = r'$B_{}$'.format(int(r.block)+1)
        else:
            label = r'$A_{}$'.format(int(r.block)+1)
        ax.annotate(label, xy=(r.time-1, r.value+.5))


.. image:: figures/subject-inference-at-one-point_figure22_1.png
   :width: 15 cm


You may also combine the two plots (click image to enlarge):


.. code:: python

    
    figw = 11.842
    figh = 5.442
    
    figw = 11.842
    figh = 7.442
    
    a = fit1.params[0]
    b = fit1.params[1] + a
    
    left       = 0.1
    right      = 0.95
    maintop    = 0.99
    mainbottom = 0.38
    meantop    = 0.36
    meanbottom = 0.1
    
    pt.figure(figsize=(figw,figh))
    
    # The top figure:
    main_ax = pt.axes([left,mainbottom,right-left,maintop-mainbottom])
    mean_ax = pt.axes([left,meanbottom,right-left,meantop-meanbottom],
        sharex=main_ax)
    main_ax.scatter(data.time,data.signal,c='k',s=w,marker='o')
    main_ax.set_ylabel('Observed MR Signal')
    main_ax.plot(elem,predict1,'-',c=c1,lw=1.2,label='task/block')
    pt.setp(main_ax.get_xticklabels(),visible=False)
    sb.despine()
    
    # The bottom figure:
    mean_ax.axhline(a, c='grey', alpha=.3, lw=10)#, label='mean signal in A')
    mean_ax.axhline(b, c='lightgrey', alpha=.3, lw=10)#, label='mean signal in B')
    mean_ax.plot(elem,predict1,'-',c=c1,lw=1.2,
            label='task/block [fully saturated]')
    mean_ax.plot(elem,predict2,'--',c=c2,lw=.8,
            label='task+time [linear in time]')
    mean_ax.plot(elem,predict3,':',c=c3,lw=.8,
            label='task+bs(time,df=12) [high temporal flexibility]')
    mean_ax.set_ylabel('Estimated Mean\nby Model')
    mean_ax.set_xlabel('Time (s)')
    mean_ax.legend(loc='lower center', bbox_to_anchor=(.5, -0.43), ncol=3)
    
    #Add block labels:
    for r in positions.itertuples():
        if np.isclose(r.task,0):
            label = r'$B_{}$'.format(int(r.block)+1)
        else:
            label = r'$A_{}$'.format(int(r.block)+1)
        mean_ax.annotate(label, xy=(r.time-1, r.value+1.8))
    sb.despine()
    
    # Save the plot to disk:
    pt.savefig(scatter_plot, pad_inches=0, bbox_inches='tight')


.. image:: figures/subject-inference-at-one-point_figure24_1.png
   :width: 15 cm


Formal testing
..............

Test for non-zero task related effects.

Effect strength
###############


.. code:: python

    
    print(fit1.params[1])
    print(fit2.params[1])
    print(fit3.params[1])


.. code::

    14.526104774712344
    14.057526450085476
    13.339944081795096
    

Test statistic
##############


.. code:: python

    
    print(fit1.tvalues[1])
    print(fit2.tvalues[1])
    print(fit3.tvalues[1])


.. code::

    7.460551272843621
    7.112567149684203
    4.4370014597571314
    

P-Values
########


.. code:: python

    
    print(fit1.pvalues[1])
    print(fit2.pvalues[1])
    print(fit3.pvalues[1])


.. code::

    1.0980615941762318e-13
    1.3918174784009391e-12
    9.422750022149465e-06
    

Test whether time is an influential factor in model 2 at this
coordinate:


.. code:: python

    print(fit2.pvalues[2])


.. code::

    0.08344187219441945
    

Test whether time is an influential factor in model 2 at this
coordinate:


.. code:: python

    R = np.identity(len(fit3.params))
    R = R[2:,:]
    print(fit3.f_test(R))


.. code::

    <F test: F=array([[1.9721569]]), p=0.022923767880701367,
    df_denom=3273, df_num=12>
    

It can safely be argued that time has no significant influence, and that
the data suggest no temporal signal fluctuation at :math:`x`. Note that
even if there were, the non-parametric time dimension of model 1 would
still be able to handle it.

All p-values need to be adjusted for multiple testing.