Extraction Module#
Module: pyradise.fileio.extraction
General#
The extraction module provides class prototypes, simple implementations, and examples
of extractors which are intended to be used to retrieve information from file paths or DICOM files to construct
Modality, Organ, and
Annotator instances. Typically, extractors are used in combination with a
Crawler to retrieve the necessary information for
Subject construction which happens during loading.
If working with DICOM data, extractors provide an alternative to generating and maintaining modality configuration files. This alternative is especially useful if the data is well organized and the necessary information can be retrieved easily from the data. If the data varies and contains ambiguous content we recommend to use modality configuration files instead because they are more flexible.
Class Overview#
The following abstract Extractor classes are provided by the extraction module:
Class |
Description |
|---|---|
Base class for all |
|
Prototype |
|
Prototype |
|
Prototype |
The following concrete Extractor classes are provided by the extraction module:
Class |
Description |
|---|---|
A simple |
|
A simple |
|
A simple |
Details#
- class Extractor[source]#
Bases:
ABCAn abstract base class for all extractors. An extractor extracts information about a file from its file path, the files content or from any other source of data in order to provide identification information (e.g. the imaging modality of a certain NIFTI file). Extractors can be used in combination with a
Crawlerto extract theModality,OrganorAnnotatorinstances forSubjectconstruction.Typically, the user needs to implement the concrete extractor classes specific for the current task. This renders flexibility and allows for a wide range of use cases. However, the user can also use the provided implementations and examples to get started quickly.
- class ModalityExtractor(return_default=False)[source]#
Bases:
ExtractorA prototype class to extract the
Modalityfrom DICOM files and discrete image file paths. It must be implemented by the user and is intended to be used with theCrawlertypes for DICOM and discrete image files. Thus, both abstract methods (i.e.extract_from_dicom()andextract_from_path()) need to be implemented. In case of working exclusively on DICOM or discrete image files, one extraction method may contain just areturn None.Important
If the file path does not specify an intensity image the extractor must return
None.Warning
If
return_defaultis set toTruetheModalityExtractorwill return an enumerated defaultModalityfor each file for which no modality could be extracted. This will have the effect that no error will be raised during loading. However, this functionality is intended to be used exlusively for experimenting and debugging purposes such that the user can load data without implementing a complete extractor. It’s not recommended to use this feature for production purposes. Subsequent errors may arise.Notes
If using the
ModalityExtractorin combination with aCrawlerall paths to the discrete image files are provided sequentially to extract theModality. In case of working with DICOM data theCrawlerwill provide just one arbitrary file path to theModalityExtractor.Example
Example of a
ModalityExtractorimplementation to identify detailed modalities:>>> from typing import (Any, Dict, Optional) >>> >>> from pyradise.fileio import (ModalityExtractor, Tag) >>> from pyradise.data import Modality >>> >>> >>> class ExampleModalityExtractor(ModalityExtractor): >>> >>> @staticmethod >>> def _get_mr_modality(ds_dict: Dict[str, Any]) -> Optional[Modality]: >>> # check for different variants of attributes to get the sequence >>> # identification >>> scanning_sq = ds_dict.get('Scanning Sequence', {}).get('value', []) >>> scanning_sq = [scanning_sq] if isinstance(scanning_sq, str) else scanning_sq >>> contrast = ds_dict.get('Contrast/Bolus Agent', {}).get('value', '') >>> >>> if all(val in scanning_sq for val in ('SE', 'IR')): >>> return Modality('FLAIR') >>> elif all(val in scanning_sq for val in ('GR', 'IR')) and len(contrast) > 0: >>> return Modality('T1c') >>> elif all(val in scanning_sq for val in ('GR', 'IR')) and len(contrast) == 0: >>> return Modality('T1w') >>> elif all(val == 'SE' for val in scanning_sq): >>> return Modality('T2w') >>> else: >>> return None >>> >>> def extract_from_dicom(self, path: str) -> Optional[Modality]: >>> # extract the necessary attributes from the file >>> tags = (Tag(0x0008, 0x0060), # Modality >>> Tag(0x0018, 0x0010), # ContrastBolusAgent >>> Tag(0x0018, 0x0020)) # ScanningSequence >>> dataset_dict = self._load_dicom_attributes(tags, path) >>> >>> # identify the modality >>> extracted_modality = dataset_dict.get('Modality', {}).get('value', None) >>> if extracted_modality == 'CT': >>> return Modality('CT') >>> elif extracted_modality == 'MR': >>> return self._get_mr_modality(dataset_dict) >>> else: >>> return None >>> >>> def extract_from_path(self, path: str) -> Optional[Modality]: >>> # extract the necessary attributes from the file name >>> file_name = os.path.basename(path) >>> if 'T1c' in file_name: >>> return Modality('T1c') >>> elif 'T1w' in file_name: >>> return Modality('T1w') >>> elif 'T2w' in file_name: >>> return Modality('T2w') >>> elif 'FLAIR' in file_name: >>> return Modality('FLAIR') >>> elif 'CT' in file_name: >>> return Modality('CT') >>> else: >>> return None
- Parameters:
return_default (bool) – Indicates if an enumerated default
Modalityshould be returned if the extraction was not successful. Use this option exclusively for experimentation and debugging because it can cause severe damage (default: False).
- is_enumerated_default_modality(modality)[source]#
Check if the specified modality is an enumerated default modality.
- Parameters:
modality (Optional[Union[Modality, str]]) – The modality to check.
- Returns:
True if the modality is an enumerated default modality, False otherwise.
- Return type:
bool
- abstract extract_from_dicom(path)[source]#
Extract the
Modalityfrom the DICOM file at the specified path. If the modality can not be detectedNonemust be returned.Notes
For your implementation you can load the DICOM file or specific DICOM attributes using the
load_dataset()orload_dataset_tag()functions from thepyradise.utilsmodule. For a detailed description of the DICOM attributes we refer to the DICOM Standard and the DICOM Standard Browser.
- class SimpleModalityExtractor(modalities, return_default=False)[source]#
Bases:
ModalityExtractorA simple
ModalityExtractorimplementation that uses the ‘Modality’ attribute in the provided DICOM image or searches for a provided set of modality names (modalities) in the file name in case of a discrete image file to generate aModalitywith the same name. If no match is foundNoneis returned.- Parameters:
modalities (Tuple[str, ...]) – The possible modality names for the intensity files which will also be used to name the
Modality.return_default (bool) – Indicates if an enumerated default
Modalityshould be returned if the extraction was not successfully. Use this option exclusively for experimentation and debugging because it can cause severe damage (default: False).
- extract_from_path(path)[source]#
Extract the
Modalityfrom the file name using the providedmodalities. If there is no matchNoneis returned.
- extract_from_dicom(path)[source]#
Extract the DICOM attribute ‘Modality’ from the provided DICOM file. If no or an invalid ‘Modality’ attribute is found,
Noneis returned.Notes
This method exclusively extracts the following top-level modalities: CT, MR, PT, and US. For all other values of the DICOM ‘Modality’ attribute
Noneis returned.
- class OrganExtractor[source]#
Bases:
ExtractorA prototype class to extract an
Organfrom a discrete image file path. This class must be implemented by the user and is intended to be used with aCrawlerfor discrete image formats.Important
If the file path does not specify a segmentation image the extractor must return
None.Example
Example of an
OrganExtractorimplementation which takes search strings and associated organ names to extract anOrganfrom a file path:>>> from typing import (Any, Dict, Optional) >>> >>> from pyradise.fileio import OrganExtractor >>> from pyradise.data import Organ >>> >>> >>> class ExampleOrganExtractor(OrganExtractor): >>> >>> def __init__(self, >>> search_strings: Dict[str, str], >>> names: Tuple[str, ...] >>> ) -> None: >>> super().__init__() >>> >>> assert len(search_strings) == len(names), / >>> f'Number of search strings ({len(search_strings)}) must match the ' >>> f'number of organ names ({len(names)})!' >>> >>> self.search_strings = search_strings >>> self.names = names >>> >>> def extract(self, path: str) -> Optional[Organ]: >>> file_name = os.path.basename(path) >>> >>> for search_string, name in zip(self.search_strings, self.names): >>> if search_string in file_name: >>> return Organ(name) >>> >>> return None
- class SimpleOrganExtractor(organs)[source]#
Bases:
OrganExtractorA simple
OrganExtractorimplementation that searches for a provided set of organ names (organs) in the file name and generates anOrganwith the same name. If no match is foundNoneis returned.- Parameters:
organs (Tuple[str, ...]) – The possible organ names which will also be used to name the output
Organ.
- class AnnotatorExtractor[source]#
Bases:
ExtractorA prototype class to extract an
Annotatorfrom a discrete image file path. This class must be implemented by the user and is intended to be used with aCrawlerfor discrete image formats.Important
If the file path does not specify a segmentation image the extractor must return
None.Example
Example of an
AnnotatorExtractorimplementation which takes search strings and associated annotator names to extract aAnnotatorfrom a file path:>>> from typing import (Any, Dict, Optional) >>> >>> from pyradise.fileio import AnnotatorExtractor >>> from pyradise.data import Annotator >>> >>> >>> class ExampleAnnotatorExtractor(AnnotatorExtractor): >>> >>> def __init__(self, >>> search_strings: Dict[str, str], >>> names: Tuple[str, ...] >>> ) -> None: >>> super().__init__() >>> >>> assert len(search_strings) == len(names), / >>> f'Number of search strings ({len(search_strings)}) must match the' >>> f'number of annotator names ({len(names)})!' >>> >>> self.search_strings = search_strings >>> self.names = names >>> >>> def extract(self, path: str) -> Optional[Annotator]: >>> file_name = os.path.basename(path) >>> >>> for search_string, name in zip(self.search_strings, self.names): >>> if search_string in file_name: >>> return Annotator(name) >>> >>> return None
- class SimpleAnnotatorExtractor(annotators)[source]#
Bases:
AnnotatorExtractorA simple
AnnotatorExtractorimplementation that searches for a provided set of annotator names (annotators) in the file name and generates aAnnotatorwith the same name. If no match is foundNoneis returned.- Parameters:
annotators (Tuple[str, ...]) – The possible annotator names which will also be used to name the output
Annotator.