.. _study-interface:

Analysing multiple subjects
---------------------------

fmristats provides a variety of command line tools which allow access to
most of its functionality.

Before you start, a small but important warning: more »traditional«
approaches to FMRI data analysis recommend or encourage user to first
spatially smooth data prior to the statistical data analysis and to
apply motion or slice timing corrections. In fmristats, spatial
smoothing is an integral part of the model fitting process itself. When
using fmristats:

.. warning::

    * DO NOT alter your images.
    * DO NOT perform any kind of motion correction.
    * DO NOT correct for slice timing differences.
    * DO NOT smooth your data.

    Your statistical analysis will benefit, you will gain power, and you
    will be rewarded with valid statistical tests.

Introduction
............

At the beginning of the :ref:`getting-started`-chapter, we had created a
single subject study for the subject number *42* in a study called `foo`:

.. code:: shell

    fmristudy -v -o foo.study \
        --cohort foo \
        --id 42 \
        --datetime 2015-11-01-1354 \
        --paradigm HFM \
        --single-subject my-subject

This created a default file layout for all files that fmristats may
create or may expect as input: You can see the file layout that has been
created by increasing the verbosity of ``fmriinfo``:

.. code:: shell

    fmriinfo -v foo.study

.. code::

    File layout:
    ------------
    stimulus       : my-subject.stimulus
    session        : my-subject.session
    reference_maps : my-subject.rigids
    population_map : my-subject.popmap
    result         : my-subject.fit

Whenever the in- or output of any files or filenames match this template
you may simply omit the respective argument in any fmristats command
line tool. For example, instead of writing:

.. code:: shell

    fmrifit -v \
        --fit            my-subject.fit \
        --session        my-subject.session \
        --reference-maps my-subject.rigids \
        --population-map my-subject.popmap \
        --window-radius 2 1 1 \
        --stimulus-block faces \
        --control-block shapes

You may simply write:

.. code:: shell

    fmrifit -v --study foo.study \
        --window-radius 2 1 1 \
        --stimulus-block faces \
        --control-block shapes

If the file ``foo.study`` is in your current directory, you may even
omit the ``--study`` argument:

.. code:: shell

    fmrifit -v \
        --window-radius 2 1 1 \
        --stimulus-block faces \
        --control-block shapes

At the end of the :ref:`getting-started`-chapter, we had created a
single subject study for the subject number *23* in a study called `bar`:

.. code:: shell

    fmristudy -v -o bar.study \
        --cohort bar \
        --id 23 \
        --datetime 2017-04-29-0933 \
        --paradigm HFM \
        --single-subject yes

Instead of providing ``fmristudy`` with a string to
``--single-subject``, we had simply written »yes« to the option. This
created the file templates:

.. code:: shell

    fmriinfo -v bar.study

.. code::

    File layout:
    ------------
    stimulus       : {cohort}-{id:04d}-{paradigm}-{date}.stimulus
    session        : {cohort}-{id:04d}-{paradigm}-{date}.session
    reference_maps : {cohort}-{id:04d}-{paradigm}-{date}-{rigids}.rigids
    population_map : {cohort}-{id:04d}-{paradigm}-{date}-{space}-{psi}.popmap
    result         : {cohort}-{id:04d}-{paradigm}-{date}-{space}-{psi}-{rigids}.fit

And at the end of the analysis, our directory contained the files:

.. code::

    bar-0023-HFM-2017-04-29-0933-mni152-ants-pcm.fit
    bar-0023-HFM-2017-04-29-0933-mni152-ants.popmap
    bar-0023-HFM-2017-04-29-0933-pcm.rigids
    bar-0023-HFM-2017-04-29-0933.session
    bar-0023-HFM-2017-04-29-0933.stimulus

As you can see, templates my be as complex or as simple as you like. You
may modify the templates too your liking:

.. code:: shell

    fmristudy -v --push --study bar.study \
        --fit look-at-{id}-{paradigm}-in-my-{study}.fit \
        --reference-maps {id}-{paradigm}-my-untested-method.rigids \
        --population-map my-funny-space-for-{id}.popmap

This will change the file templates to:

.. code:: shell

    fmriinfo -v bar.study

.. code::

    File layout:
    ------------
    stimulus       : {cohort}-{id:04d}-{paradigm}-{date}.stimulus
    session        : {cohort}-{id:04d}-{paradigm}-{date}.session
    reference_maps : {id}-{paradigm}-my-untested-method.rigids
    population_map : my-funny-space-for-{id}.popmap
    result         : look-at-{id}-{paradigm}-in-my-{study}.fit

Every token between curly brackets will be replaced accordingly.

.. _protocol:

Creating a study instance for multiple subjects
...............................................

Say, you have organised your study in a table :download:`protocol.csv`
that contains at least the following columns:

======   ==  ===============   ========   =====  ===
cohort   id             date   paradigm   valid  epi
======   ==  ===============   ========   =====  ===
FOO       1  2013-04-22-0930        HFM    True    3
FOO       1  2014-05-04-1700        WGT   False    3
FOO       2  2014-05-04-1930        HFM    True    3
FOO       2  2014-04-24-0645        WGT    True    3
BAR       1  2015-07-23-0930        WGT    True    3
BAR       2  2014-04-25-1400        WGT    True    3
======   ==  ===============   ========   =====  ===

In particular:

- There should be a column called »cohort« that contains a unique label
  for the cohort to which a subject belongs. Here, we have a study
  consisting of two cohorts: »foo« and »bar«. If your study only has a
  single cohort, simply name the cohort with the name of your study.

- The column »id« contains integer ids which uniquely identify a subject
  within its cohort.

- The column »date« contains the date and time at which the respective
  FMRI session of a subject has been taken place. If you are preforming
  a longitudinal study with multiple measurements of the same subject
  and with respect to the same stimulus design, this column will allow
  you (and fmristats) to distinguish between them.

- The column »paradigm« contains a short token of the name of the
  stimulus design that has been used in the FMRI session.

- The column »valid« indicates whether it can be assumed that this
  session contains valid data or if this session should be excluded from
  the downstream analysis. It may take one of two values: ``True`` or
  ``False``. fmristats will exclude all and will not process any entries
  that have been marked with ``False``.

- The last column, ``epi`` codes the direction in which the EPI in the
  session have been measured:

    ==== =====================
    epi  direction
    ==== =====================
    -3   superior to inferior
    -2   anterior to posterior
    -1   right to left
     1   left to right
     2   posterior to anterior
     3   inferior to superior
    ==== =====================

    This information will be used by fmristats for handling slice
    timing differences.

If you save the above table as a CSV to a file ``protocol.csv``, the
following creates a study instance for the above FMRI sessions:

.. code::

    csv2dataframe protocol.csv protocol.pkl

    fmristudy --out tutorial.study --protocol protocol.pkl

The tool ``csv2dataframe`` will parse the CSV-file, it will generate an
index from the columns *cohort*, *id*, *paradigm*, and *date*, it will
sort the index, and it will verified integrity of index. You can have a
look at the created data frame by calling ``fmriinfo`` on the file:

.. code::

    fmriinfo protocol.pkl

.. code::

    protocol.pkl: protocol or covariates file:

    Number of entries: 6
    Entries:
                                            valid  epi
    cohort id paradigm date
    BAR    1  WGT      2015-07-23 09:30:00   True    3
           2  WGT      2014-04-25 14:00:00   True    3
    FOO    1  HFM      2013-04-22 09:30:00   True    3
              WGT      2014-05-04 17:00:00  False    3
           2  HFM      2014-05-04 19:30:00   True    3
              WGT      2014-04-24 06:45:00   True    3

    Number of valid entries:
                     valid  total
    cohort paradigm
    BAR    WGT           2      2
    FOO    HFM           2      2
           WGT           1      2

If there are less then 12 entries in the protocol, ``fmriinfo`` will
also show you all entries in the protocol. The file ``protocol.pkl`` is
indeed noting else than a pickled DataFrame_: You have the full powers
of Pandas_ at you disposal.

The default file layout for multi-subject (multi-session) studies is:

.. code:: shell

    fmriinfo -v tutorial.study

.. code::

    File layout:
    ------------
    stimulus      : {study}/{paradigm}/{cohort}/{id:04d}/{cohort}-{id:04d}-{paradigm}-{date}.stimulus
    session       : {study}/{paradigm}/{cohort}/{id:04d}/{cohort}-{id:04d}-{paradigm}-{date}.session
    reference_maps: {study}/{paradigm}/{cohort}/{id:04d}/{cohort}-{id:04d}-{paradigm}-{date}-{rigids}.rigids
    population_map: {study}/{paradigm}/{cohort}/{id:04d}/{cohort}-{id:04d}-{paradigm}-{date}-{space}-{psi}.popmap
    result        : {study}/{paradigm}/{cohort}/{id:04d}/{cohort}-{id:04d}-{paradigm}-{date}-{space}-{psi}-{rigids}.fit

This means that at the end of your analysis of, say, paradigm *WGT*,
your file hierarchy will look something like this:

.. code:: shell

    results
    └── WGT
        └── BAR
            ├── 0002
            │   ├── AGB300-0002-WGT-2014-04-24-0930-mni152-ants-pcm.fit
            │   ├── AGB300-0002-WGT-2014-04-24-0930-mni152-ants.popmap
            │   ├── AGB300-0002-WGT-2014-04-24-0930-pcm.rigids
            │   ├── AGB300-0002-WGT-2014-04-24.0930-session
            │   └── AGB300-0002-WGT-2014-04-24.0930-stimulus
            ├── 0003
            │   ├── AGB300-0003-WGT-2014-04-25-1445-mni152-ants-pcm.fit
            │   ├── AGB300-0003-WGT-2014-04-25-1445-mni152-ants.popmap
            │   ├── AGB300-0003-WGT-2014-04-25-1445-pcm.rigids
            │   ├── AGB300-0003-WGT-2014-04-25.1445-session
            │   └── AGB300-0003-WGT-2014-04-25.1445-stimulus
            …

Handling subject covariates
...........................

The table in :ref:`protocol` contains one entry per FMRI session and
**not** one entry per subject. Indeed, fmristats (that is its interface)
suggests to manage the covariates of subjects in a separate table
:download:`covariates.csv` instead:

======   ==  ======  ===  ==========   =====
cohort   id     sex  age  handedness   valid
======   ==  ======  ===  ==========   =====
FOO       1  female   23        67.0    True
FOO       2  female   24        49.0    True
BAR       1  female   25        37.0   False
BAR       2    male   26       -59.0    True
======   ==  ======  ===  ==========   =====

Covariates contain information about *subjects*, such as *age*, *sex*,
case/control *status*, or *handedness*, whereas the protocol contains
information *about FMRI sessions*. In a protocol there is one entry per
FMRI session; in a covariates file there is one entry per subject.

Again, there exists a column *valid* in the table that indicates whether
a subject should be included or excluded from the downstream analysis.
Here, for example, subject 1 in cohort BAR has been flagged as invalid,
say, due to non-compliance of the subject. The column ``valid`` is
optional and, if missing, will be set to ``True`` by default.

fmristats will only include the data of an FMRI session in the
statistical analysis when both: the FMRI session is marked as valid in
the protocol **and** the respective subject is marked as valid in the
covariates file. The reasoning is that fmristats assumes that any
subject may have participated in multiple FMRI sessions and with respect
to a variety different paradigms. The reason for why to exclude a
particular FMRI session from a statistical analysis is typically complex
but generally falls into two categories: Either the FMRI data are not of
sufficient quality (severe movements of subject in the scanner,
equipment failure, presentation software did not sync, …) or the subject
had to be excluded due to other, external reasons (say due to
non-compliance to the study protocol or other exclusion criteria). The
former would be recorded in the *protocol table* the latter in the
*covariates table*. The idea is that the covariate file may be
maintained *independent* of the respective study protocol file.

Whether to include covariates to you study is optional. You can include
a covariates file to your study as follows:

.. code:: shell

    csv2dataframe covariates.csv covariates.pkl

    fmristudy --out tutorial.study -v \
        --protocol protocol.pkl \
        --covariates covariates.pkl

Again, have a look at the file that has been created:

.. code:: shell

    fmriinfo tutorial.study

.. code::

    tutorial.study: study file

    Protocol:
    ---------
    Number of entries: 6
    Entries:
                                            valid  epi
    cohort id paradigm date
    BAR    1  WGT      2015-07-23 09:30:00   True    3
           2  WGT      2014-04-25 14:00:00   True    3
    FOO    1  HFM      2013-04-22 09:30:00   True    3
              WGT      2014-05-04 17:00:00  False    3
           2  HFM      2014-05-04 19:30:00   True    3
              WGT      2014-04-24 06:45:00   True    3

    Number of valid entries:
                     valid  total
    cohort paradigm
    BAR    WGT           2      2
    FOO    HFM           2      2
           WGT           1      2

    Covariates:
    -----------
    Number of entries: 4
    Entries:
                  sex  age  handedness  valid
    cohort id
    BAR    1   female   25        37.0  False
           2     male   26       -59.0   True
    FOO    1   female   23        67.0   True
           2   female   24        49.0   True

    Number of valid entries:
                   valid  total
    cohort sex
    BAR    female      0      1
           male        1      1
    FOO    female      2      2

Defining subsamples
...................

Say, in your analysis, you are working with stronger inclusion/exclusion
criteria than the study that you are using for your data. Then you may
include or *recorde* this information in a
:download:`subjects_to_exclude.csv` in which you mark all subject as
invalid that do not comply to your inclusion criteria:

======  ==  =====
cohort  id  valid
======  ==  =====
FOO      1  False
BAR      2  False
======  ==  =====

But, of course, you want to use all the information of your master
study. The solution is to update your covariates file:

.. code:: shell

    csv2dataframe subjects_to_exclude.csv subjects_to_exclude.pkl

    fmristudy -pv \
        --protocol protocol.pkl \
        --covariates covariates.pkl \
        --covariates-update subjects_to_exclude.pkl

    fmriinfo subjects_to_exclude.pkl
    fmriinfo tutorial.study

Say, you have done some extensive quality controlling of the FMRI
sessions that your would like to include in your study but the
information has not yet entered the official database. As you may have
guessed, the solution is to update your protocol file with a
:download:`sessions_to_exclude.csv`:

======   ==  ========   =====  ===
cohort   id  paradigm   valid  epi
======   ==  ========   =====  ===
FOO       1       HFM    True    3
FOO       1       WGT   False    3
FOO       2       HFM    True    3
FOO       2       WGT    True    3
BAR       1       WGT    True    3
BAR       2       WGT    True    3
======   ==  ========   =====  ===

.. code:: shell

    csv2dataframe sessions_to_exclude.csv sessions_to_exclude.pkl

    fmristudy -pv \
        --protocol protocol.pkl \
        --covariates covariates.pkl \
        --protocol-update sessions_to_exclude.pkl

    fmriinfo sessions_to_exclude.pkl
    fmriinfo tutorial.study

As the example suggests, it is possible to omit columns from the tables
used for the update (here the »date«-column). The update will then apply
to all entries in the protocol or covariates table that match at least
the entries in the update table.

Say, we are (only) interested in the analysis of a single paradigm (say
WGT) that tests the language processing of subjects in a simple word
generation task. And say, we only interested in a right-handed
population (say with a `Waterloo Handedness Score`_ above +20). We
select the respective sample from the pool of available subjects by:

.. code:: shell

    fmristudy --out tutorial.study -v \
        --protocol protocol.pkl \
        --covariates covariates.pkl \
        --paradigm WGT --covariates-query  'handedness > 20'

.. _Pandas: https://pandas.pydata.org/pandas-docs/stable/

.. _DataFrame: https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe

.. _`Waterloo Handedness Score`: https://www.ucl.ac.uk/medical-education/resources/Waterloo/WatFoot_HandQuest36items-Elias1998.pdf

Including a template in standard space in the study instance
............................................................

If you are planing to compare the fitted parameter and statistic fields
of the subjects in your sample, you will likely work with a template in
standard space that you would like to provide as a reference to
``fmrisample``, ``ants4pop``, or ``fsl4pop``. You may add the template
to your study instance right at the start. This will assure that all
tools will work with the same template and in the same standard space:

.. code:: shell

    nii2image --name mni152 \
        /usr/share/data/fsl-mni152-templates/MNI152_T1_2mm_brain.nii.gz \
        vb-template.image

    nii2image --name mni152-t1 \
        /usr/share/data/fsl-mni152-templates/MNI152_T1_2mm.nii.gz \
        vb-background.image

    fmristudy --out tutorial.study -v \
        --protocol protocol.pkl \
        --covariates covariates.pkl \
        --vb-image vb-template.image \
        --vb-background-image vb-background.image

        --vb-ati-image vb-ati.image \

FAQ: I have only recorded the date but not the time of my FMRI sessions
.......................................................................

Say, your protocol table looks as follows
:download:`protocol-short.csv`:

======   ==  ==========   ========   =====  ===
cohort   id        date   paradigm   valid  epi
======   ==  ==========   ========   =====  ===
FOO       1  2013-04-22        HFM    True    3
FOO       1  2014-05-04        WGT   False    3
FOO       2  2014-05-04        HFM    True    3
FOO       2  2014-04-24        WGT    True    3
BAR       1  2015-07-23        WGT    True    3
BAR       2  2014-04-25        WGT    True    3
======   ==  ==========   ========   =====  ===

If you save the above table to ``protocol-short.csv``, then adding
``--strftime short`` will successfully parse your file:

.. code::

    csv2dataframe protocol-short.csv protocol-short.pkl --strftime short

If you also add the same argument ``--strftime short`` to ``fmristudy``
when creating your study instance:

.. code::

    fmristudy -v --out tutorial.study \
        --protocol protocol.pkl --strftime short

Then this will also change the date/time-format string when creating or
searching for files. This means that at the end of your analysis of,
say, paradigm *WGT*, your file hierarchy will look as follows:

.. code:: shell

    results
    └── WGT
        └── BAR
            ├── 0002
            │   ├── AGB300-0002-WGT-2014-04-24-mni152-ants-pcm.fit
            │   ├── AGB300-0002-WGT-2014-04-24-mni152-ants.popmap
            │   ├── AGB300-0002-WGT-2014-04-24-pcm.rigids
            │   ├── AGB300-0002-WGT-2014-04-24.session
            │   └── AGB300-0002-WGT-2014-04-24.stimulus
            ├── 0003
            │   ├── AGB300-0003-WGT-2014-04-25-mni152-ants-pcm.fit
            │   ├── AGB300-0003-WGT-2014-04-25-mni152-ants.popmap
            │   ├── AGB300-0003-WGT-2014-04-25-pcm.rigids
            │   ├── AGB300-0003-WGT-2014-04-25.session
            │   └── AGB300-0003-WGT-2014-04-25.stimulus
            …

Example: Analysing multiple subjects
....................................

The following goes through an example of analysing the paradigm »WGT«
that tests the language processing of subjects in a simple word
generation task in a right-handed population (with a `Waterloo
Handedness Score`_ above +20).

.. code:: shell

    fmristudy --out tutorial.study -v \
        --protocol protocol.pkl \
        --protocol-update sessions_to_exclude.pkl \
        --covariates covariates.pkl \
        --covariates-query  'waterloo > 20' \
        --vb-image vb-template.image \
        --vb-ati-image vb-ati.image \
        --vb-background-image vb-background.image \
        --paradigm WGT \
        --strftime short

Assuming you have saved the stimulus information and FMRI data in the
following systematic:

.. code::

    raw
    ├── mat
    │   └── WGT
    │       ├── AGB300-0001-WGT-2014-04-22.mat
    │       ├── AGB300-0002-WGT-2014-04-24.mat
    │       ├── AGB300-0003-WGT-2014-04-25.mat
    …       …
    │       ├── AGB300-0116-WGT-2016-04-04.mat
    │       ├── AGB300-0117-WGT-2016-04-05.mat
    │       └── AGB300-0119-WGT-2016-04-18.mat
    └── nii
        └── WGT
            ├── AGB300-0001-WGT-2014-04-22.nii
            ├── AGB300-0002-WGT-2014-04-24.nii
            ├── AGB300-0003-WGT-2014-04-25.nii
            …
            ├── AGB300-0116-WGT-2016-04-04.nii
            ├── AGB300-0117-WGT-2016-04-05.nii
            └── AGB300-0119-WGT-2016-04-18.nii

Then the following code will parse the MATLAB-coded stimulus design and
the Nifti1-file of the subjects, and it will run a foreground/background
detection on each FMRI:

.. code:: shell

    # Stimulus Design for all subjects
    mat2block -v --mat-prefix 'raw/mat/{paradigm}'

    # Session Data for all subjects
    nii2session -v --detect-foreground \
        --nii-prefix 'raw/nii/{paradigm}'

The above path-prefixes are indeed the default. If you have organised
your files the same way, you may omit the respective arguments.

The following will fit subject movements by a principal components
method and the population maps using ANTS. Then it will fit the default,
nested model to the FMRI. After that non-brain areas are pruned from the
resulting statistics fields.

.. code:: shell

    # Head movement estimates for all subjects
    fmririgids -vp --cycle 42 84 102

    # Assessment of tracking ability
    ref2plot -v --window 2 1 1

    # The mni152 population map for each subject
    ants4pop -vp --cycle 42 84 102

    # Fit
    fmrifit -v --stimulus-block letter --window 2 1 1

    # Prune statistics field
    fsl4prune -v