Extraction Module#

Module: pyradise.fileio.extraction

General#

The extraction module provides class prototypes, simple implementations, and examples of extractors which are intended to be used to retrieve information from file paths or DICOM files to construct Modality, Organ, and Annotator instances. Typically, extractors are used in combination with a Crawler to retrieve the necessary information for Subject construction which happens during loading.

If working with DICOM data, extractors provide an alternative to generating and maintaining modality configuration files. This alternative is especially useful if the data is well organized and the necessary information can be retrieved easily from the data. If the data varies and contains ambiguous content we recommend to use modality configuration files instead because they are more flexible.

Class Overview#

The following abstract Extractor classes are provided by the extraction module:

Class

Description

Extractor

Base class for all Extractor subclasses

ModalityExtractor

Prototype Extractor for Modality extraction on discrete images and DICOM images.

OrganExtractor

Prototype Extractor for Organ extraction from discrete images.

AnnotatorExtractor

Prototype Extractor for Annotator extraction from discrete images.

The following concrete Extractor classes are provided by the extraction module:

Class

Description

SimpleModalityExtractor

A simple ModalityExtractor.

SimpleOrganExtractor

A simple OrganExtractor.

SimpleAnnotatorExtractor

A simple AnnotatorExtractor.

Details#

class Extractor[source]#

Bases: ABC

An abstract base class for all extractors. An extractor extracts information about a file from its file path, the files content or from any other source of data in order to provide identification information (e.g. the imaging modality of a certain NIFTI file). Extractors can be used in combination with a Crawler to extract the Modality, Organ or Annotator instances for Subject construction.

Typically, the user needs to implement the concrete extractor classes specific for the current task. This renders flexibility and allows for a wide range of use cases. However, the user can also use the provided implementations and examples to get started quickly.

abstract extract(path)[source]#

Extract information about the file at the specified path.

Parameters:

path (str) – The path to the file for which information needs to be extracted.

Returns:

The extracted information.

Return type:

Any

class ModalityExtractor(return_default=False)[source]#

Bases: Extractor

A prototype class to extract the Modality from DICOM files and discrete image file paths. It must be implemented by the user and is intended to be used with the Crawler types for DICOM and discrete image files. Thus, both abstract methods (i.e. extract_from_dicom() and extract_from_path()) need to be implemented. In case of working exclusively on DICOM or discrete image files, one extraction method may contain just a return None.

Important

If the file path does not specify an intensity image the extractor must return None.

Warning

If return_default is set to True the ModalityExtractor will return an enumerated default Modality for each file for which no modality could be extracted. This will have the effect that no error will be raised during loading. However, this functionality is intended to be used exlusively for experimenting and debugging purposes such that the user can load data without implementing a complete extractor. It’s not recommended to use this feature for production purposes. Subsequent errors may arise.

Notes

If using the ModalityExtractor in combination with a Crawler all paths to the discrete image files are provided sequentially to extract the Modality. In case of working with DICOM data the Crawler will provide just one arbitrary file path to the ModalityExtractor.

Example

Example of a ModalityExtractor implementation to identify detailed modalities:

>>> from typing import (Any, Dict, Optional)
>>>
>>> from pyradise.fileio import (ModalityExtractor, Tag)
>>> from pyradise.data import Modality
>>>
>>>
>>> class ExampleModalityExtractor(ModalityExtractor):
>>>
>>>     @staticmethod
>>>     def _get_mr_modality(ds_dict: Dict[str, Any]) -> Optional[Modality]:
>>>         # check for different variants of attributes to get the sequence
>>>         # identification
>>>         scanning_sq = ds_dict.get('Scanning Sequence', {}).get('value', [])
>>>         scanning_sq = [scanning_sq] if isinstance(scanning_sq, str) else scanning_sq
>>>         contrast = ds_dict.get('Contrast/Bolus Agent', {}).get('value', '')
>>>
>>>         if all(val in scanning_sq for val in ('SE', 'IR')):
>>>             return Modality('FLAIR')
>>>         elif all(val in scanning_sq for val in ('GR', 'IR')) and len(contrast) > 0:
>>>             return Modality('T1c')
>>>         elif all(val in scanning_sq for val in ('GR', 'IR')) and len(contrast) == 0:
>>>             return Modality('T1w')
>>>         elif all(val == 'SE' for val in scanning_sq):
>>>             return Modality('T2w')
>>>         else:
>>>             return None
>>>
>>>     def extract_from_dicom(self, path: str) -> Optional[Modality]:
>>>         # extract the necessary attributes from the file
>>>         tags = (Tag(0x0008, 0x0060),  # Modality
>>>                 Tag(0x0018, 0x0010),  # ContrastBolusAgent
>>>                 Tag(0x0018, 0x0020))  # ScanningSequence
>>>         dataset_dict = self._load_dicom_attributes(tags, path)
>>>
>>>         # identify the modality
>>>         extracted_modality = dataset_dict.get('Modality', {}).get('value', None)
>>>         if extracted_modality == 'CT':
>>>             return Modality('CT')
>>>         elif extracted_modality == 'MR':
>>>             return self._get_mr_modality(dataset_dict)
>>>         else:
>>>             return None
>>>
>>>     def extract_from_path(self, path: str) -> Optional[Modality]:
>>>         # extract the necessary attributes from the file name
>>>         file_name = os.path.basename(path)
>>>         if 'T1c' in file_name:
>>>             return Modality('T1c')
>>>         elif 'T1w' in file_name:
>>>             return Modality('T1w')
>>>         elif 'T2w' in file_name:
>>>             return Modality('T2w')
>>>         elif 'FLAIR' in file_name:
>>>             return Modality('FLAIR')
>>>         elif 'CT' in file_name:
>>>             return Modality('CT')
>>>         else:
>>>             return None
Parameters:

return_default (bool) – Indicates if an enumerated default Modality should be returned if the extraction was not successful. Use this option exclusively for experimentation and debugging because it can cause severe damage (default: False).

is_enumerated_default_modality(modality)[source]#

Check if the specified modality is an enumerated default modality.

Parameters:

modality (Optional[Union[Modality, str]]) – The modality to check.

Returns:

True if the modality is an enumerated default modality, False otherwise.

Return type:

bool

abstract extract_from_dicom(path)[source]#

Extract the Modality from the DICOM file at the specified path. If the modality can not be detected None must be returned.

Notes

For your implementation you can load the DICOM file or specific DICOM attributes using the load_dataset() or load_dataset_tag() functions from the pyradise.utils module. For a detailed description of the DICOM attributes we refer to the DICOM Standard and the DICOM Standard Browser.

Parameters:

path (str) – The path to the DICOM file to extract the Modality from.

Returns:

The extracted Modality or None.

Return type:

Optional[Modality]

abstract extract_from_path(path)[source]#

Extract the Modality from the file path to a discrete image file or from another other data source. If the modality can not be detected None must be returned.

Parameters:

path (str) – The path to the file to extract the Modality for.

Returns:

The extracted Modality or None.

Return type:

Optional[Modality]

extract(path)[source]#

Extract the Modality for either a DICOM or a discrete medical image file.

Parameters:

path (str) – The path to the file to extract the Modality for.

Returns:

The extracted Modality or None.

Return type:

Optional[Modality]

class SimpleModalityExtractor(modalities, return_default=False)[source]#

Bases: ModalityExtractor

A simple ModalityExtractor implementation that uses the ‘Modality’ attribute in the provided DICOM image or searches for a provided set of modality names (modalities) in the file name in case of a discrete image file to generate a Modality with the same name. If no match is found None is returned.

Parameters:
  • modalities (Tuple[str, ...]) – The possible modality names for the intensity files which will also be used to name the Modality.

  • return_default (bool) – Indicates if an enumerated default Modality should be returned if the extraction was not successfully. Use this option exclusively for experimentation and debugging because it can cause severe damage (default: False).

extract_from_path(path)[source]#

Extract the Modality from the file name using the provided modalities. If there is no match None is returned.

Parameters:

path (str) – The path to the file to extract the Modality for.

Returns:

The extracted Modality or None.

Return type:

Optional[Modality]

extract_from_dicom(path)[source]#

Extract the DICOM attribute ‘Modality’ from the provided DICOM file. If no or an invalid ‘Modality’ attribute is found, None is returned.

Notes

This method exclusively extracts the following top-level modalities: CT, MR, PT, and US. For all other values of the DICOM ‘Modality’ attribute None is returned.

Parameters:

path (str) – The path to the DICOM file to extract the Modality from.

Returns:

The extracted Modality or None.

Return type:

Optional[Modality]

extract(path)#

Extract the Modality for either a DICOM or a discrete medical image file.

Parameters:

path (str) – The path to the file to extract the Modality for.

Returns:

The extracted Modality or None.

Return type:

Optional[Modality]

is_enumerated_default_modality(modality)#

Check if the specified modality is an enumerated default modality.

Parameters:

modality (Optional[Union[Modality, str]]) – The modality to check.

Returns:

True if the modality is an enumerated default modality, False otherwise.

Return type:

bool

class OrganExtractor[source]#

Bases: Extractor

A prototype class to extract an Organ from a discrete image file path. This class must be implemented by the user and is intended to be used with a Crawler for discrete image formats.

Important

If the file path does not specify a segmentation image the extractor must return None.

Example

Example of an OrganExtractor implementation which takes search strings and associated organ names to extract an Organ from a file path:

>>> from typing import (Any, Dict, Optional)
>>>
>>> from pyradise.fileio import OrganExtractor
>>> from pyradise.data import Organ
>>>
>>>
>>> class ExampleOrganExtractor(OrganExtractor):
>>>
>>>     def __init__(self,
>>>                  search_strings: Dict[str, str],
>>>                  names: Tuple[str, ...]
>>>                  ) -> None:
>>>         super().__init__()
>>>
>>>         assert len(search_strings) == len(names), /
>>>         f'Number of search strings ({len(search_strings)}) must match the '         >>>         f'number of organ names ({len(names)})!'
>>>
>>>         self.search_strings = search_strings
>>>         self.names = names
>>>
>>>     def extract(self, path: str) -> Optional[Organ]:
>>>         file_name = os.path.basename(path)
>>>
>>>         for search_string, name in zip(self.search_strings, self.names):
>>>             if search_string in file_name:
>>>                 return Organ(name)
>>>
>>>         return None
extract(path)[source]#

Extract the Organ from the file path.

Parameters:

path (str) – The path to the file to extract the Organ for.

Returns:

The extracted Organ or None.

Return type:

Optional[Organ]

class SimpleOrganExtractor(organs)[source]#

Bases: OrganExtractor

A simple OrganExtractor implementation that searches for a provided set of organ names (organs) in the file name and generates an Organ with the same name. If no match is found None is returned.

Parameters:

organs (Tuple[str, ...]) – The possible organ names which will also be used to name the output Organ.

extract(path)[source]#

Extract the Organ from the file name using the provided organs. If no Organ can be extracted or the file does not contain a segmentation image None is returned.

Parameters:

path (str) – The path to the file to extract the Organ for.

Returns:

The extracted Organ or None.

Return type:

Optional[Organ]

class AnnotatorExtractor[source]#

Bases: Extractor

A prototype class to extract an Annotator from a discrete image file path. This class must be implemented by the user and is intended to be used with a Crawler for discrete image formats.

Important

If the file path does not specify a segmentation image the extractor must return None.

Example

Example of an AnnotatorExtractor implementation which takes search strings and associated annotator names to extract a Annotator from a file path:

>>> from typing import (Any, Dict, Optional)
>>>
>>> from pyradise.fileio import AnnotatorExtractor
>>> from pyradise.data import Annotator
>>>
>>>
>>> class ExampleAnnotatorExtractor(AnnotatorExtractor):
>>>
>>>     def __init__(self,
>>>                  search_strings: Dict[str, str],
>>>                  names: Tuple[str, ...]
>>>                  ) -> None:
>>>         super().__init__()
>>>
>>>         assert len(search_strings) == len(names), /
>>>         f'Number of search strings ({len(search_strings)}) must match the'         >>>         f'number of annotator names ({len(names)})!'
>>>
>>>         self.search_strings = search_strings
>>>         self.names = names
>>>
>>>     def extract(self, path: str) -> Optional[Annotator]:
>>>         file_name = os.path.basename(path)
>>>
>>>         for search_string, name in zip(self.search_strings, self.names):
>>>             if search_string in file_name:
>>>                 return Annotator(name)
>>>
>>>         return None
extract(path)[source]#

Extract the Annotator from the file path.

Parameters:

path (str) – The path to the file to extract the Annotator for.

Returns:

The extracted Annotator or None.

Return type:

Optional[Annotator]

class SimpleAnnotatorExtractor(annotators)[source]#

Bases: AnnotatorExtractor

A simple AnnotatorExtractor implementation that searches for a provided set of annotator names (annotators) in the file name and generates a Annotator with the same name. If no match is found None is returned.

Parameters:

annotators (Tuple[str, ...]) – The possible annotator names which will also be used to name the output Annotator.

extract(path)[source]#

Extract the Annotator from the file name using the provided annotators. If no Annotator can be extracted or the file does not contain a segmentation image None is returned.

Parameters:

path (str) – The path to the file to extract the Annotator for.

Returns:

The extracted Annotator or None.

Return type:

Optional[Annotator]