.. _Compare_PSG_events:

===============================
Evaluate Detected Events
===============================

A tool to compare two sets of events, such as detections versus expert annotations, to evaluate the performance of a detector. 
This tool can also be used to evaluate the concordance between the scoring of two experts.

**Definition of the evaluation metrics**

Let's first define the variables :

   * The **event** from the expert : ``e``  
   * The **detection** : ``d``
   * True Positive (**TP**) : Correct detection (``e`` and ``d`` are the same).
   * False Positive (**FP**) : Incorrect detection (``d`` does not match any ``e``).
   * False negatives (**FN**) : Event missed (``e`` not detected)

Evalutation metrics :

   * **Precision** : ``TP/(TP+FP)`` : Fraction of detections that are correct
   * **Recall** : ``TP/(TP+FN)`` : Fraction of events found
   * **F1 score** = ``2 x (precision x recall)/(precision + recall)``
   * **kappa** = ``(2 * (tp*tn - fn*fp))/((tp + fp)*(fp + tn) + (tp + fn)*(fn + tn))``

      .. warning::

         kappa is considered a conservative agreement because the expected agreement is removed from the score.

Metrics are computed in the samples domain, therefore the list of events ``e`` and ``d`` are sampled at 100 Hz and the units of TP, TN, FP, FN are samples. 

i.e. TP-samples=500 means 500 samples from the expert events are correctly detected. 

* Pro : the performance evaluation is conservative (strict)
* Con : many shorter ``d`` can match a longer ``e`` without significant penalty, therefore not suited for event density.

Metrics are also computed in the events domain with the use of the Jaccord index. 

**Jaccord index** : ``(intersection between e and d) / (union of e and d)``

To considere a ``d`` as a TP, the jaccord index must exceed a certain threshold.
Only one ``d`` can match a ``e``, the one with the highest Jaccord index.

* Pro : Suited for event density.
* Con : Need to define a Jaccord index threshold.

Steps
-----------------

**1 - Input Files**

Start by opening your PSG files (.edf, .sts or .eeg). 

- **European Data Format (EDF)** : 
  
  The corresponding .tsv file is required with .edf. Both files must be saved in the same directory and share the exact same filename.

- **Stellate format (up to version 6.2)** : 
  
  The corresponding .sig file is required with the .sts. Both files must be saved in the same directory and share the exact same filename.

- **NATUS format (version 9.1)** : 
  
  (*CEAMS users only*) The entire NATUS subject folder is required.

For more details on accepted formats, see :ref:`accepted_format`.

**2 - Expert Annotation**
	
Select for each PSG file the expert events as gold standard.

**3 - Detection Event**
	
Select for each PSG file the detections to be compared against the expert events. 

**4 - Output Files**

Select the sleep stages to perform the comparison in. (I.e. N2 for sleep spindles.)

Define the jaccord index threhold to compute the performance evaluation. 

Jaccord index : ``(intersection between e and d) / (union of e and d)``

The output performance file is written in the same directory as the PSG file.
The output file is named as the PSG file with an additional suffix "_perf" and the extension .tsv.
One evaluation file per PSG file is generated.


Version History
-----------------

* v2.1.0 : Distributed with CEAMS package version 7.2.0 — Snooz beta 2.0.1
    - Initial release of the tool.

* v2.2.0 : Distributed with CEAMS package version 7.3.0 — Snooz beta 3.0.0
    - UI improvements for consistent tool and input file descriptions.