micro_dl.utils module
Submodules
micro_dl.utils.aux_utils module
- micro_dl.utils.aux_utils.adjust_slice_margins(slice_ids, depth)
Adjusts slice indices to given z depth by removing indices too close to boundaries. Assumes that slice indices are contiguous.
- Parameters:
slice_ids (list of ints) – Slice (z) indices
depth (int) – Number of z slices
- Return list slice_ids:
Slice indices with adjusted margins
- Raises:
AssertionError: if depth is even
- Raises:
AssertionError: if there aren’t enough slice ids for given depth
- Raises:
AssertionError: if slices aren’t contiguous
- micro_dl.utils.aux_utils.convert_channel_names_to_ids(channel_map, channel_list)
Assuming you have a dict from get_channels and a list of channel names, you get a list of channel indices.
- Parameters:
channel_map (dict) – Channel names with indices
channel_list (list) – List of channel names (subset of channel_map) if containing ints, return as is.
- Return list channel_ids:
List of (int) channel indices
- Raises:
AssertionError – if any channel in list is not in channel_map
- micro_dl.utils.aux_utils.get_channel_axis(data_format)
Get the channel axis given the data format
- Parameters:
data_format (str) – as named. [channels_last, channel_first]
:return int channel_axis
- micro_dl.utils.aux_utils.get_channels(frames_meta)
Load frames metadata from directory, find channel names and their corresponding indices.
- Parameters:
frames_meta (pd.DataFrame) – Metadata for frames
- Return dict channel_map:
Channel name and corresponding index
- Raises:
AssertionError – if channel name column is incompletely populated
- micro_dl.utils.aux_utils.get_im_name(time_idx=None, channel_idx=None, slice_idx=None, pos_idx=None, extra_field=None, ext='.png', int2str_len=3)
Create an image name given parameters and extension
- Parameters:
time_idx (int) – Time index
channel_idx (int) – Channel index
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
extra_field (str) – Any extra string you want to include in the name
ext (str) – Extension, e.g. ‘.png’ or ‘.npy’
int2str_len (int) – Length of string of the converted integers
- Return st im_name:
Image file name
- micro_dl.utils.aux_utils.get_meta_idx(frames_metadata, time_idx, channel_idx, slice_idx, pos_idx)
Get row index in metadata dataframe given variable indices
- Parameters:
frames_metadata (dataframe) – Dataframe with column names given below
time_idx (int) – Timepoint index
channel_idx (int) – Channel index
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
- Returns:
int pos_idx: Row position matching indices above
- micro_dl.utils.aux_utils.get_row_idx(frames_metadata, time_idx, channel_idx, slice_idx=-1, pos_idx=-1, dir_names=None)
Get the indices for images with timepoint_idx and channel_idx
- Parameters:
frames_metadata (pd.DataFrame) – DF with columns time_idx, channel_idx, slice_idx, file_name]
time_idx (int) – get info for this timepoint
channel_idx (int) – get info for this channel
slice_idx (int) – get info for this focal plane (2D)
pos_idx (int) – Specify FOV (default to all if -1)
dir_names (str) – Directory names if not in dataframe?
- Return row_idx:
Row index in dataframe
- micro_dl.utils.aux_utils.get_sms_im_name(time_idx=None, channel_name=nan, slice_idx=None, pos_idx=None, extra_field=None, ext='.tiff', int2str_len=3)
Create an image name given parameters and extension This function is custom for the computational microscopy (SMS) group, who has the following file naming convention: File naming convention is assumed to be: img_channelname_t***_p***_z***_extrafield.tif This function will alter list and dict in place.
- Parameters:
time_idx (int) – Time index
channel_name (str/NaN) – Channel name
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
extra_field (str) – Any extra string you want to include in the name
ext (str) – Extension starting with period, default ‘.tiff’
int2str_len (int) – Length of string of the converted integers
- Return str im_name:
Image file name
- micro_dl.utils.aux_utils.get_sorted_names(dir_name)
Get image names in directory and sort them by their indices
- Parameters:
dir_name (str) – Image directory name
- Return list of strs im_names:
Image names sorted according to indices
- micro_dl.utils.aux_utils.get_sub_meta(frames_metadata, time_ids, channel_ids, slice_ids, pos_ids)
Get sliced metadata dataframe given variable indices
- Parameters:
frames_metadata (dataframe) – Dataframe with column names given below
time_ids (int/list) – Timepoint indices
channel_ids (int/list) – Channel indices
slice_ids (int/list) – Slize (z) indices
pos_ids (int/list) – Position (FOV) indices
- Returns:
int pos_ids: Row positions matching indices above
- micro_dl.utils.aux_utils.import_object(module_name, obj_name, obj_type='class')
Imports a class or function dynamically
- Parameters:
module_name (str) – modules such as input, utils, train etc
obj_name (str) – Object to find
obj_type (str) – Object type (class or function)
- micro_dl.utils.aux_utils.init_logger(logger_name, log_fname, log_level)
Creates a logger instance
- Parameters:
logger_name (str) – name of the logger instance
log_fname (str) – fname with full path of the log file
log_level (int) – specifies the logging level: NOTSET:0, DEBUG:10,
INFO:20, WARNING:30, ERROR:40, CRITICAL:50
- micro_dl.utils.aux_utils.make_dataframe(nbr_rows=None, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'])
Create empty frames metadata pandas dataframe given number of rows and standard column names defined below
- Parameters:
nbr_rows ([None, int]) – The number of rows in the dataframe
df_names (list) – Dataframe column names
- Return dataframe frames_meta:
Empty dataframe with given indices and column names
- micro_dl.utils.aux_utils.parse_idx_from_name(im_name, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'], dir_name=None, order='cztp')
Assumes im_name is e.g. im_c***_z***_p***_t***.png, It doesn’t care about the extension or the number of digits each index is represented by, it extracts all integers from the image file name and assigns them by order. By default it assumes that the order is c, z, t, p. :param str im_name: Image name without path :param list of strs df_names: Dataframe col names :param str dir_name: Directory path :param str order: Order in which c, z, t, p are given in the image (4 chars) :return dict meta_row: One row of metadata given image file name
- micro_dl.utils.aux_utils.parse_sms_name(im_name, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'], dir_name=None, channel_names=[])
Parse metadata from file name or file path. This function is custom for the computational microscopy (SMS) group, who has the following file naming convention: File naming convention is assumed to be: img_channelname_t***_p***_z***.tif This function will alter list and dict in place.
- Parameters:
im_name (str) – File name or path
df_names (list of strs) – Dataframe col names
dir_name (str) – Directory path
channel_names (list[str]) – Expanding list of channel names
- Return dict meta_row:
One row of metadata given image file name
- micro_dl.utils.aux_utils.read_config(config_fname)
Read the config file in yml format. TODO: validate config!
- Parameters:
config_fname (str) – fname of config yaml with its full path
- Returns:
dict config: Configuration parameters
- micro_dl.utils.aux_utils.read_json(json_filename)
Read JSON file and validate schema
- Parameters:
json_filename (str) – json file name
- Returns:
dict json_object: JSON object
- Raises:
FileNotFoundError: if file can’t be read
- Raises:
JSONDecodeError: if file is not in json format
- micro_dl.utils.aux_utils.read_meta(input_dir, meta_fname='frames_meta.csv')
Read metadata file, which is assumed to be named ‘frames_meta.csv’ in given directory.
- Parameters:
input_dir (str) – Directory containing data and metadata
meta_fname (str) – Metadata file name
- Return dataframe frames_metadata:
Metadata for all frames
- Raises:
IOError: If metadata file isn’t present
- micro_dl.utils.aux_utils.save_tile_meta(tiles_meta, cur_channel, tiled_dir)
Save meta data for tiled images
- Parameters:
tiles_meta (list) – List of tuples holding meta info for tiled images
cur_channel (int) – Channel being tiled
tiled_dir (str) – Directory to save meta data in
- micro_dl.utils.aux_utils.sort_meta_by_channel(frames_metadata)
Rearrange metadata dataframe from all channels being listed in the same column to moving file names for each channel to separate columns.
- Parameters:
frames_metadata (dataframe) – Metadata with one column named ‘file_name’
- Return dataframe sorted_metadata:
Metadata with separate file_name_X for channel X.
- micro_dl.utils.aux_utils.validate_config(config_dict, params)
Check if the required params are present in config
- Parameters:
config_dict (dict) – dictionary with params as keys
params (list) – list of strings with expected params
- Returns:
list with bool values indicating if param is present or not
- micro_dl.utils.aux_utils.validate_indices(frames_meta, preprocess_config, idx_type)
Helper function to check if a list of position, time or slice indices in the preprocessing config exist in the frames metadata. If not, use all indices in metadata.
- Parameters:
frames_meta (pd.DataFrame) – Metadata for all images
preprocess_config (dict) – Preprocessing config
idx_type (str) – Type of index: ‘pos’, ‘time’, ‘slice’
- Return list use_ids:
Indices to be used in preprocessing
- Raises:
AssertionError – If indices in preprocess config is not a subset of those found in frames metadata
- micro_dl.utils.aux_utils.validate_metadata_indices(frames_metadata, time_ids=None, channel_ids=None, slice_ids=None, pos_ids=None, uniform_structure=True)
Check the availability of indices provided timepoints, channels, positions and slices for all data. If input ids are None, the indices for that parameter will not be evaluated. If input ids are -1, all indices for that parameter will be returned.
- Parameters:
frames_metadata (pd.DataFrame) – DF with columns time_idx, channel_idx, slice_idx, pos_idx, file_name]
time_ids (int/list) – check availability of these timepoints in frames_metadata
channel_ids (int/list) – check availability of these channels in frames_metadata
pos_ids (int/list) – Check availability of positions in metadata
slice_ids (int/list) – Check availability of z slices in metadata
uniform_structure (bool) – bool indicator if unequal quantities in any of the ids (channel, time, slice, pos)
- Return dict metadata_ids:
All indices found given input
- Raises:
AssertionError: If not all channels, timepoints, positions or slices are present
- micro_dl.utils.aux_utils.write_json(json_dict, json_filename)
Writes dict as json file.
- Parameters:
json_dict (dict) – Dictionary to be written
json_filename (str) – Full path file name of json
micro_dl.utils.image_utils module
Utility functions for processing images
- micro_dl.utils.image_utils.apply_flat_field_correction(input_image, **kwargs)
Apply flat field correction.
- Parameters:
input_image (np.array) – image to be corrected
**kwargs – See below
- Returns:
np.array (float) corrected image
- Keyword arguments:
flat_field_image (np.float) – flat_field_image for correction OR
flat_field_path (str) – Full path to flatfield image
- micro_dl.utils.image_utils.center_crop_to_shape(input_image, output_shape, image_format='zyx')
Center crop the image to a given shape
- Parameters:
input_image (np.array) – input image to be cropped
output_shape (list) – desired crop shape
image_format (str) – Image format; zyx or xyz
- Return np.array center_block:
Center of input image with output shape
- micro_dl.utils.image_utils.crop2base(im, base=2)
Crop image to nearest smaller factor of the base (usually 2), assumes xyz format, will work for zyx too but the x_shape, y_shape and z_shape will be z_shape, y_shape and x_shape respectively
- Parameters:
im (nd.array) – Image
base (int) – Base to use, typically 2
crop_z (bool) – crop along z dim, only for UNet3D
- Return nd.array im:
Cropped image
- Raises:
AssertionError: if base is less than zero
- micro_dl.utils.image_utils.fit_polynomial_surface_2D(sample_coords, sample_values, im_shape, order=2, normalize=True)
Given coordinates and corresponding values, this function will fit a 2D polynomial of given order, then create a surface of given shape.
- Parameters:
sample_coords (np.array) – 2D sample coords (nbr of points, 2)
sample_values (np.array) – Corresponding intensity values (nbr points,)
im_shape (tuple) – Shape of desired output surface (height, width)
order (int) – Order of polynomial (default 2)
normalize (bool) – Normalize surface by dividing by its mean for flatfield correction (default True)
- Return np.array poly_surface:
2D surface of shape im_shape
- micro_dl.utils.image_utils.get_flat_field_path(flat_field_dir, channel_idx, channel_ids)
Given channel and flatfield dir, check that corresponding flatfield is present and returns its path.
- Parameters:
flat_field_dir (str) – Flatfield directory
channel_idx (int) – Channel index for flatfield
channel_ids (list) – All channel indices being processed
- micro_dl.utils.image_utils.grid_sample_pixel_values(im, grid_spacing)
Sample pixel values in the input image at the grid. Any incomplete grids (remainders of modulus operation) will be ignored.
- Parameters:
im (np.array) – 2D image
grid_spacing (int) – spacing of the grid
- Return int row_ids:
row indices of the grids
- Return int col_ids:
column indices of the grids
- Return np.array sample_values:
sampled pixel values
- micro_dl.utils.image_utils.im_adjust(img, tol=1, bit=8)
Adjust contrast of the image
- micro_dl.utils.image_utils.im_bit_convert(im, bit=16, norm=False, limit=[])
- micro_dl.utils.image_utils.preprocess_image(im, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)
Do histogram clipping, z score normalization, and potentially binarization.
- Parameters:
im (np.array) – Image (stack)
hist_clip_limits (tuple) – Percentile histogram clipping limits
is_mask (bool) – True if mask
normalize_im (str/None) – Normalization, if any
zscore_mean (float/None) – Data mean
zscore_std (float/None) – Data std
- micro_dl.utils.image_utils.preprocess_imstack(frames_metadata, depth, time_idx, channel_idx, slice_idx, pos_idx, dir_name=None, flat_field_path=None, hist_clip_limits=None, normalize_im='stack')
Preprocess image given by indices: flatfield correction, histogram clipping and z-score normalization is performed.
- Parameters:
frames_metadata (pd.DataFrame) – DF with meta info for all images
depth (int) – num of slices in stack if 2.5D or depth for 3D
time_idx (int) – Time index
channel_idx (int) – Channel index
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
dir_name (str/None) – Image directory (none if using the frames_meta dir_name)
flat_field_path (np.array) – Path to flat field image for channel
hist_clip_limits (list) – Limits for histogram clipping (size 2)
normalize_im (str or None) – options to z-score the image
- Return np.array im:
3D preprocessed image
- micro_dl.utils.image_utils.read_image(file_path)
Read 2D grayscale image from file. Checks file extension for npy and load array if true. Otherwise reads regular image using OpenCV (png, tif, jpg, see OpenCV for supported files) of any bit depth.
- Parameters:
file_path (str) – Full path to image
- Return array im:
2D image
- Raises:
IOError: if image can’t be opened
- micro_dl.utils.image_utils.read_image_from_row(meta_row, dir_name=None)
Read 2D grayscale image from file. Checks file extension for npy and load array if true. Otherwise reads regular image using OpenCV (png, tif, jpg, see OpenCV for supported files) of any bit depth.
- Parameters:
meta_row (pd.DataFrame) – Row in metadata
dir_name (str/None) – Directory containing images (none if using frames meta dir_name)
- Return array im:
2D image
:raise IOError if image can’t be opened
- micro_dl.utils.image_utils.read_imstack(input_fnames, flat_field_fnames=None, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)
Read the images in the fnames and assembles a stack. If images are masks, make sure they’re boolean by setting >0 to True
- Parameters:
input_fnames (tuple/list) – Paths to input files
flat_field_fnames (str/list) – Path(s) to flat field image(s)
hist_clip_limits (tuple) – limits for histogram clipping
is_mask (bool) – Indicator for if files contain masks
normalize_im (bool/None) – Whether to zscore normalize im stack
zscore_mean (float) – mean for z-scoring the image
zscore_std (float) – std for z-scoring the image
- Return np.array:
input stack flat_field correct and z-scored if regular images, booleans if they’re masks
- micro_dl.utils.image_utils.read_imstack_from_meta(frames_meta_sub, dir_name=None, flat_field_fnames=None, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)
Read images (>1) from metadata rows and assembles a stack. If images are masks, make sure they’re boolean by setting >0 to True
- Parameters:
frames_meta_sub (pd.DataFrame) – Selected subvolume to be read
dir_name (str/None) – Directory path (none if using dir in frames_meta)
flat_field_fnames (str/list) – Path(s) to flat field image(s)
hist_clip_limits (tuple) – Percentile limits for histogram clipping
is_mask (bool) – Indicator for if files contain masks
normalize_im (bool/None) – Whether to zscore normalize im stack
zscore_mean (float) – mean for z-scoring the image
zscore_std (float) – std for z-scoring the image
- Return np.array:
input stack flat_field correct and z-scored if regular images, booleans if they’re masks
- micro_dl.utils.image_utils.rescale_image(im, scale_factor)
Rescales a 2D image equally in x and y given a scale factor. Uses bilinear interpolation (the OpenCV default).
- Parameters:
im (np.array) – 2D image
scale_factor (float) –
- Return np.array:
2D image resized by scale factor
- micro_dl.utils.image_utils.rescale_nd_image(input_volume, scale_factor)
Rescale a nd array, mainly used for 3D volume
For non-int dims, the values are rounded off to closest int. 0.5 is iffy, when downsampling the value gets floored and upsampling it gets rounded to next int
- Parameters:
input_volume (np.array) – 3D stack
scale_factor (float/list) – if scale_factor is a float, scale all dimensions by this. Else scale_factor has to be specified for each dimension in a list or tuple
- Return np.array res_volume:
rescaled volume
- micro_dl.utils.image_utils.resize_image(input_image, output_shape)
Resize image to a specified shape
- Parameters:
input_image (np.ndarray) – image to be resized
output_shape (tuple/np.array) – desired shape of the output image
- Returns:
np.array, resized image
- micro_dl.utils.image_utils.resize_mask(input_image, target_size)
Resample label/bool images
micro_dl.utils.io_utils module
- class micro_dl.utils.io_utils.DefaultZarr(store, root_path)
Bases:
WriterBase
This writer is based off creating a default HCS hierarchy for non-hcs datasets. Currently, we decide that all positions will live under individual columns under a single row. i.e. this produces the following structure: Dataset.zarr
- ____> Row_0
- —> Col_0
—> Pos_000
… –> Col_N
—> Pos_N
We assume this structure in the metadata updating/position creation
- create_position(position, name)
Creates a column and position subgroup given the index and name. Name is provided by the main writer class
- Parameters:
position (int) – Index of the position to create
name (str) – Name of the position subgroup
- init_hierarchy()
method to init the default hierarchy. Will create the first row and initialize metadata fields
- class micro_dl.utils.io_utils.ReaderBase
Bases:
object
I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io
This will be updated if the io parts of waveorder is moved to a stand alone python package.
- get_array(position: int) ndarray
- get_image(p, t, c, z) ndarray
- get_num_positions() int
- get_zarr(position: int) array
- property shape
- class micro_dl.utils.io_utils.WriterBase(store, root_path)
Bases:
object
I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io
This will be updated if the io of waveorder is moved to a stand alone python package. ABC for all writer types
- create_channel_dict(chan_name, clim=None, first_chan=False)
This will create a dictionary used for OME-zarr metadata. Allows custom contrast limits and channel names for display. Defaults everything to grayscale. Parameters ———- chan_name: (str) Desired name of the channel for display clim: (tuple) contrast limits (start, end, min, max) first_chan: (bool) whether or not this is the first channel of the dataset (display will be set to active) Returns ——- dict_: (dict) dictionary adherent to ome-zarr standards
- create_column(row_idx, idx, name=None)
Creates a column in the hierarchy (second level below zarr store, one below row). Option to name this column. Default is Col_{idx}. Keeps track of the column name + column index for later metadata creation Parameters ———- row_idx: (int) Index of the row to place the column underneath idx: (int) Index of the column (order in which it is placed) name: (str) Optional name to replace default column name Returns ——-
- create_position(position: int, name: str)
- create_row(idx, name=None)
Creates a row in the hierarchy (first level below zarr store). Option to name this row. Default is Row_{idx}. Keeps track of the row name + row index for later metadata creation Parameters ———- idx: (int) Index of the row (order in which it is placed) name: (str) Optional name to replace default row name Returns ——-
- get_zarr()
- init_array(data_shape, chunk_size, dtype, chan_names, clims, overwrite=False)
Initializes the zarr array under the current position subgroup. array level is called ‘arr_0’ in the hierarchy. Sets omero/multiscales metadata based upon chan_names and clims Parameters ———- data_shape: (tuple) Desired Shape of your data (T, C, Z, Y, X). Must match data chunk_size: (tuple) Desired Chunk Size (T, C, Z, Y, X). Chunking each image would be (1, 1, 1, Y, X) dtype: (str or np.dtype) Data Type, i.e. ‘uint16’ or np.uint16 chan_names: (list) List of strings corresponding to your channel names. Used for OME-zarr metadata clims: (list) list of tuples corresponding to contrast limtis for channel. OME-Zarr metadata
tuple can be of (start, end, min, max) or (start, end)
overwrite: (bool) Whether or not to overwrite the existing data that may be present. Returns ——-
- init_hierarchy()
- open_position(position: int)
Opens a position based upon the position index. It will navigate the rows/column to find where this position is based off of the generation position map which keeps track of this information. It will set current_pos_group to this position for writing the data Parameters ———- position: (int) Index of the position you wish to open Returns ——-
- set_channel_attributes(chan_names, clims=None)
A method for creating ome-zarr metadata dictionary. Channel names are defined by the user, everything else is pre-defined. Parameters ———- chan_names: (list) List of channel names in the order of the channel dimensions
i.e. if 3D Phase is C = 0, list ‘3DPhase’ first.
clims: (list of tuples) contrast limits to display for every channel
- set_root(root)
set the root path of the zarr store. Used in the main writer class. Parameters ———- root: (str) path to the zarr store (folder ending in .zarr) Returns ——-
- set_store(store)
Sets the zarr store. Used in the main writer class Parameters ———- store: (Zarr StoreObject) Opened zarr store at the highest level Returns ——-
- set_verbosity(verbose: bool)
- write(data, t, c, z)
Write data to specified index of initialized zarr array :param data: (nd-array), data to be saved. Must be the shape that matches indices (T, C, Z, Y, X) :param t: (list), index or index slice of the time dimension :param c: (list), index or index slice of the channel dimension :param z: (list), index or index slice of the z dimension
- class micro_dl.utils.io_utils.ZarrReader(zarrfile: str)
Bases:
ReaderBase
I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io
Reader for HCS ome-zarr arrays. OME-zarr structure can be found here: https://ngff.openmicroscopy.org/0.1/ Also collects the HCS metadata so it can be later copied.
- get_array(position)
Gets the (T, C, Z, Y, X) array at given position
- Parameters:
position (int) – Position index
- Return np.array pos:
Array of size (T, C, Z, Y, X) at specified position
- get_image(p, t, c, z)
Returns the image at dimension P, T, C, Z
- Parameters:
p (int) – Index of the position dimension
t (int) – Index of the time dimension
c (int) – Index of the channel dimension
z (int) – Index of the z dimension
- Return np.array image:
Image at the given dimension of shape (Y, X)
- get_image_plane_metadata(p, c, z)
For the sake of not keeping an enormous amount of metadata, only the microscope conditions for the first timepoint are kept in the zarr metadata during write. User can only query image
plane metadata at p, c, z
- Parameters:
p (int) – Position index
c (int) – Channel index
z (int) – Z-slice index
- Return dict metadata:
Image Plane Metadata at given coordinate w/ T = 0
- get_num_positions() int
- get_zarr(position)
Returns the position-level zarr group array (not in memory)
- Parameters:
position (int) – Position index
:return ZarrArray Zarr array containing the (T, C, Z, Y, X) array at given position
- class micro_dl.utils.io_utils.ZarrWriter(save_dir: Optional[str] = None, hcs_meta: Optional[dict] = None, verbose: bool = False)
Bases:
object
I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io
- given stokes or physical data, construct a standard hierarchy in zarr for output
should conform to the ome-zarr standard as much as possible
TODO: Allow for writing multiple positions in same store
- create_zarr_root(name)
Method for creating the root zarr store. If the store already exists, it will raise an error. Name corresponds to the root directory name (highest level) zarr store.
- Parameters:
name (str) – Name of the zarr store.
- current_group_name = None
- current_position = None
- init_array(position, data_shape, chunk_size, chan_names, dtype='float32', clims=None, position_name=None, overwrite=False)
Creates a subgroup structure based on position index. Then initializes the zarr array under the current position subgroup. Array level is called ‘array’ in the hierarchy.
- Parameters:
position (int) – Position index upon which to initialize array
data_shape (tuple) – Desired Shape of your data (T, C, Z, Y, X). Must match data
chunk_size (tuple) – Desired Chunk Size (T, C, Z, Y, X). Chunking each image would be (1, 1, 1, Y, X)
dtype (str) – Data Type, i.e. ‘uint16’
clims (list) – List of tuples corresponding to contrast limtis for channel. OME-Zarr metadata
overwrite (bool) – Whether or not to overwrite the existing data that may be present.
- Parm list chan_names:
List of strings corresponding to your channel names. Used for OME-zarr metadata
- store = None
- write(data, p, t=None, c=None, z=None)
Wrapper that calls the builder’s write function. Will write to existing array of zeros and place data over the specified indicies.
- Parameters:
data (np.array) – Data to be saved. Must be the shape that matches indices (T, C, Z, Y, X)
p (int) – Position index in which to write the data into
t (int/slice) – Time index or index range of the time dimension
c (int/slice) – Channel index or index range of the channel dimension
z (int/slice) – Slice index or index range of the Z-slice dimension
micro_dl.utils.masks module
- micro_dl.utils.masks.create_otsu_mask(input_image, str_elem_size=3, thr=None, kernel_size=3, w_shed=False)
Create a binary mask using morphological operations Opening removes small objects in the foreground.
- Parameters:
input_image (np.array) – generate masks from this image
str_elem_size (int) – size of the structuring element. typically 3, 5
thr (float) – Threshold
kernel_size (int) – Kernel size
w_shed (bool) – Whether to use watershed
- Returns:
mask of input_image, np.array
- micro_dl.utils.masks.create_unimodal_mask(input_image, str_elem_size=3, kernel_size=3)
Create a mask with unimodal thresholding and morphological operations. Unimodal thresholding seems to oversegment, erode it by a fraction
- Parameters:
input_image (np.array) – generate masks from this image
str_elem_size (int) – size of the structuring element. typically 3, 5
:return mask of input_image, np.array
- micro_dl.utils.masks.get_unet_border_weight_map(annotation, w0=10, sigma=5)
Return weight map for borders as specified in UNet paper. Note: The below method only works for UNet Segmentation only. TODO: Calculate boundaries directly and calculate distance from boundary of cells to another.
:param annotation A 2D array of shape (image_height, image_width) contains annotation with each class labeled as an integer. :param w0 multiplier to the exponential distance loss default 10 as mentioned in UNet paper :param sigma standard deviation in the exponential distance term e^(-d1 + d2) ** 2 / 2 (sigma ^ 2) default 5 as mentioned in UNet paper :return weight map for borders as specified in UNet
- micro_dl.utils.masks.get_unimodal_threshold(input_image)
Determines optimal unimodal threshold
https://users.cs.cf.ac.uk/Paul.Rosin/resources/papers/unimodal2.pdf https://www.mathworks.com/matlabcentral/fileexchange/45443-rosin-thresholding
- Parameters:
input_image (np.array) – generate mask for this image
- Return float best_threshold:
optimal lower threshold for the foreground hist
micro_dl.utils.meta_utils module
- micro_dl.utils.meta_utils.compute_zscore_params(frames_meta, ints_meta, input_dir, normalize_im, min_fraction=0.99)
Compute median and interquartile range of intensities in blocks/tiles determined ints_meta_generator function (saved in intensity_meta.csv). Masks need to bee computed and only tiles with enough foreground given masks (determined by min_fraction) will be included in the analysis.
- Parameters:
frames_meta (pd.DataFrame) – Dataframe containing all metadata
ints_meta (pd.DataFrame) – Metadata containing intensity statistics each z-slice and foreground fraction for masks
input_dir (str) – Directory containing images
normalize_im (None/str) – normalization scheme for input images
min_fraction (float) – Minimum foreground fraction of masks for computing intensity statistics.
- Return pd.DataFrame frames_meta:
DataFrame containing all metadata
- Return pd.DataFrame ints_meta:
Metadata containing intensity statistics each z-slice
- micro_dl.utils.meta_utils.frames_meta_from_filenames(input_dir, name_parser)
Extracts metadata (channel, position, time, slice) from file name.
- Parameters:
input_dir (str) – path to input directory containing images
name_parser (str) – Function in aux_utils for parsing indices from file name
- Return pd.DataFrame frames_meta:
Metadata for all frames in dataset
- micro_dl.utils.meta_utils.frames_meta_from_zarr(input_dir, file_names)
Reads ome-zarr file and creates frames_meta based on metadata and array information. Assumes one zarr store per position according to OME guidelines.
- Parameters:
input_dir (str) – Input directory
file_names (list) – List of full paths to all zarr files in dir
- Return pd.DataFrame frames_meta:
Metadata for all frames in zarr
- micro_dl.utils.meta_utils.frames_meta_generator(input_dir, file_format='zarr', name_parser='parse_sms_name')
Generate metadata from file names, or metadata in the case of zarr files, for preprocessing. Will write found data in frames_metadata.csv in input directory.
Naming convention for default parser ‘parse_sms_name’: img_channelname_t***_p***_z***.tif for parse_sms_name
The file structure for ome-zarr files is described here: https://ngff.openmicroscopy.org/0.1/
- Parameters:
input_dir (str) – path to input directory containing image data
file_format (str) – Image file format (‘zarr’ or ‘tiff’ or ‘png’)
name_parser (str) – Function in aux_utils for parsing indices from tiff/png file name
- Return pd.DataFrame frames_meta:
Metadata for all frames in dataset
- micro_dl.utils.meta_utils.ints_meta_generator(input_dir, channel_ids, num_workers=4, block_size=256, flat_field_dir=None)
Generate pixel intensity metadata for estimating image normalization parameters during preprocessing step. Pixels are sub-sampled from the image following a grid pattern defined by block_size to for efficient estimation of median and interquartile range. Grid sampling is preferred over random sampling in the case due to the spatial correlation in images. Will write found data in ints_meta.csv in input directory. Assumed default naming convention for tiff files is: img_channelname_t***_p***_z***.tif for parse_sms_name
- Parameters:
input_dir (str) – path to input directory containing images
channel_ids (list) – Channel indices to process
num_workers (int) – number of workers for multiprocessing
block_size (int) – block size for the grid sampling pattern. Default value works well for 2048 X 2048 images.
flat_field_dir (str) – Directory containing flatfield images
- micro_dl.utils.meta_utils.mask_meta_generator(input_dir, num_workers=4)
Generate pixel intensity metadata for estimating image normalization parameters during preprocessing step. Pixels are sub-sampled from the image following a grid pattern defined by block_size to for efficient estimation of median and interquatile range. Grid sampling is preferred over random sampling in the case due to the spatial correlation in images. Will write found data in intensity_meta.csv in input directory. Assumed default file naming convention is:
img_channelname_t***_p***_z***.tif for parse_sms_name
- Parameters:
input_dir (str) – path to input directory containing images
order (str) – Order in which file name encodes cztp
name_parser (str) – Function in aux_utils for parsing indices from file name
num_workers (int) – number of workers for multiprocessing
- Return pd.DataFrame mask_meta:
Metadata with mask info
micro_dl.utils.mp_utils module
- micro_dl.utils.mp_utils.create_save_mask(channels_meta_sub, flat_field_fnames, str_elem_radius, mask_dir, mask_channel_idx, int2str_len, mask_type, mask_ext, dir_name=None, channel_thrs=None)
Create and save mask. When more than one channel are used to generate the mask, mask of each channel is generated then added together.
- Parameters:
channels_meta_sub (pd.DataFrame) – Metadata for given PTCZ
flat_field_fnames (list/None) – Paths to corresponding flat field images
str_elem_radius (int) – size of structuring element used for binary opening. str_elem: disk or ball
mask_dir (str) – dir to save masks
mask_channel_idx (int) – channel number of mask
time_idx (int) – time points to use for generating mask
pos_idx (int) – generate masks for given position / sample ids
slice_idx (int) – generate masks for given slice ids
int2str_len (int) – Length of str when converting ints
mask_type (str) – thresholding type used for masking or str to map to masking function
mask_ext (str) – ‘.npy’ or ‘.png’. Save the mask as uint8 PNG or NPY files for otsu, unimodal masks, recommended to save as npy float64 for borders_weight_loss_map masks to avoid loss due to scaling it to uint8.
dir_name (str/None) – Image directory (none if using frames_meta dir_name)
channel_thrs (list) – list of threshold for each channel to generate binary masks. Only used when mask_type is ‘dataset_otsu’
- Return dict cur_meta:
For each mask, fg_frac is added to metadata
- micro_dl.utils.mp_utils.crop_at_indices_save(meta_sub, flat_field_fname, hist_clip_limits, slice_idx, crop_indices, image_format, save_dir, dir_name=None, int2str_len=3, is_mask=False, tile_3d=False, normalize_im=True, zscore_mean=None, zscore_std=None)
Crop image into tiles at given indices and save.
- Parameters:
meta_sub (pd.DataFrame) – Subset of metadata for images to be cropped
flat_field_fname (str) – File nname of flat field image
hist_clip_limits (tuple) – limits for histogram clipping
time_idx (int) – time point of input image
channel_idx (int) – channel idx of input image
slice_idx (int) – slice idx of input image
pos_idx (int) – sample idx of input image
crop_indices (tuple) – tuple of indices for cropping
image_format (str) – zyx or xyz
save_dir (str) – output dir to save tiles
dir_name (str/None) – Input directory
int2str_len (int) – len of indices for creating file names
is_mask (bool) – Indicates if files are masks
tile_3d (bool) – indicator for tiling in 3D
- Returns:
pd.DataFrame from a list of dicts with metadata
- micro_dl.utils.mp_utils.get_im_stats(im_path)
Read and computes statistics of images.
- Parameters:
im_path (str) – Full path to image
- Return dict meta_row:
Dict with intensity data for image
- micro_dl.utils.mp_utils.get_mask_meta_row(file_path, meta_row)
Given path to mask, read mask, compute foreground fraction and fill in corresponding metadata row.
- Parameters:
file_path (str) – Path to binary mask image
meta_row (pd.DataFrame) – Metadata row to fill in
- Return pd.DataFrame meta_row:
Metadata row with foreground fraction for mask
- micro_dl.utils.mp_utils.mp_create_save_mask(fn_args, workers)
Create and save masks with multiprocessing
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- Returns:
list of returned dicts from create_save_mask
- micro_dl.utils.mp_utils.mp_crop_save(fn_args, workers)
Crop and save images with multiprocessing.
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- Returns:
list of returned df from crop_at_indices_save
- micro_dl.utils.mp_utils.mp_get_im_stats(fn_args, workers)
Read and computes statistics of images with multiprocessing.
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- Returns:
list of returned df from get_im_stats
- micro_dl.utils.mp_utils.mp_rescale_vol(fn_args, workers)
Rescale and save image stacks with multiprocessing.
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- micro_dl.utils.mp_utils.mp_resize_save(mp_args, workers)
Resize and save images with multiprocessing.
- Parameters:
mp_args (list) – Function keyword arguments
workers (int) – max number of workers
- micro_dl.utils.mp_utils.mp_sample_im_pixels(fn_args, workers)
Read and computes statistics of images with multiprocessing.
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- Returns:
list of returned df from get_im_stats
- micro_dl.utils.mp_utils.mp_tile_save(fn_args, workers)
Tile and save with multiprocessing https://stackoverflow.com/questions/42074501/python-concurrent-futures-processpoolexecutor-performance-of-submit-vs-map
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- Returns:
list of returned df from tile_and_save
- micro_dl.utils.mp_utils.mp_wrapper(fn, fn_args, workers)
Create and save masks with multiprocessing
- Parameters:
fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers
- Returns:
list of returned dicts from create_save_mask
- micro_dl.utils.mp_utils.rescale_vol_and_save(time_idx, pos_idx, channel_idx, slice_start_idx, slice_end_idx, frames_metadata, dir_name, output_fname, scale_factor, ff_path)
Rescale volumes and save.
- Parameters:
time_idx (int) – time point of input image
pos_idx (int) – sample idx of input image
channel_idx (int) – channel idx of input image
slice_start_idx (int) – start slice idx for the vol to be saved
slice_end_idx (int) – end slice idx for the vol to be saved
frames_metadata (pd.Dataframe) – metadata for the input slices
dir_name (str/None) – Image directory (none if using dir_name from frames_meta)
output_fname (str) – output_fname
scale_factor (float/list) – scale factor for resizing
ff_path (str/None) – path to flat field image
- micro_dl.utils.mp_utils.resize_and_save(**kwargs)
Resizes images and saving them. Performs flatfield correction prior to resizing if flatfield images are present.
- Parameters:
kwargs – Keyword arguments:
str file_path: Path to input image str write_path: Path to image to be written float scale_factor: Scale factor for resizing str ff_path: path to flat field correction image
- micro_dl.utils.mp_utils.sample_im_pixels(meta_row, ff_path, grid_spacing, dir_name=None)
Read and computes statistics of images for each point in a grid. Grid spacing determines distance in pixels between grid points for rows and cols. Applies flatfield correction prior to intensity sampling if flatfield path is specified.
- Parameters:
meta_row (dict) – Metadata row for image
ff_path (str) – Full path to flatfield image corresponding to image
grid_spacing (int) – Distance in pixels between sampling points
dir_name (str/None) – Image directory (none if using dir_name from frames_meta)
- Return list meta_rows:
Dicts with intensity data for each grid point
- micro_dl.utils.mp_utils.tile_and_save(meta_sub, flat_field_fname, hist_clip_limits, slice_idx, tile_size, step_size, min_fraction, image_format, save_dir, dir_name=None, int2str_len=3, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)
Crop image into tiles at given indices and save.
- Parameters:
meta_sub (pd.DataFrame) – Subset of metadata for images to be tiled
flat_field_fname (str) – fname of flat field image
hist_clip_limits (tuple) – limits for histogram clipping
slice_idx (int) – slice idx of input image
tile_size (list) – size of tile along row, col (& slices)
step_size (list) – step size along row, col (& slices)
min_fraction (float) – min foreground volume fraction for keep tile
image_format (str) – zyx / xyz
save_dir (str) – output dir to save tiles
dir_name (str/None) – Image directory
int2str_len (int) – len of indices for creating file names
is_mask (bool) – Indicates if files are masks
normalize_im (str/None) – Normalization method
zscore_mean (float/None) – Mean for normalization
zscore_std (float/None) – Std for normalization
- Returns:
pd.DataFrame from a list of dicts with metadata
micro_dl.utils.network_utils module
- micro_dl.utils.network_utils.create_activation_layer(activation_dict)
Get the keras activation / advanced activation
- Parameters:
activation_dict (dict) – Nested dict with keys: type -> activation type and params -> dict activation related params such as alpha, theta, alpha_initializer, alpha_regularizer etc from advanced activations
- Return keras.layer:
instance of activation layer
- micro_dl.utils.network_utils.get_keras_layer(type, num_dims)
Get the 2D or 3D keras layer
- Parameters:
stype (str) – type of layer [conv, pooling, upsampling]
num_dims (int) – dimensionality of the image [2 ,3]
- Returns:
keras.layer
- micro_dl.utils.network_utils.get_layer_shape(layer_shape, data_format)
Get the layer shape without the batch and channel dimensions
- Parameters:
layer_shape (list) – output of layer.get_output_shape.as_list()
data_format (str) – in [channels_first, channels_last]
- Returns:
np.array layer_shape_xyz - layer shape without batch and channel dimensions
micro_dl.utils.normalize module
Image normalization related functions
- micro_dl.utils.normalize.hist_adapteq_2D(input_image, kernel_size=None, clip_limit=None)
CLAHE on 2D images
skimage.exposure.equalize_adapthist works only for 2D. Extend to 3D or use openCV? Not ideal, as it enhances noise in homogeneous areas
- Parameters:
input_image (np.array) – input image for intensity normalization
kernel_size (int/list) – Neighbourhood to be used for histogram equalization. If none, use default of 1/8th image size.
clip_limit (float) – Clipping limit, normalized between 0 and 1 (higher values give more contrast, ~ max percent of voxels in any histogram bin, if > this limit, the voxel intensities are redistributed). if None, default=0.01
- micro_dl.utils.normalize.hist_clipping(input_image, min_percentile=2, max_percentile=98)
Clips and rescales histogram from min to max intensity percentiles
rescale_intensity with input check
- Parameters:
input_image (np.array) – input image for intensity normalization
min_percentile (int/float) – min intensity percentile
max_percentile (int/flaot) – max intensity percentile
- Returns:
np.float, intensity clipped and rescaled image
- micro_dl.utils.normalize.unzscore(im_norm, zscore_median, zscore_iqr)
Revert z-score normalization applied during preprocessing. Necessary before computing SSIM
- Parameters:
im_norm – Normalized image for un-zscore
zscore_median – Image median
zscore_iqr – Image interquartile range
- Return im:
image at its original scale
- micro_dl.utils.normalize.zscore(input_image, im_mean=None, im_std=None)
Performs z-score normalization. Adds epsilon in denominator for robustness
- Parameters:
input_image (np.array) – input image for intensity normalization
im_mean (float/None) – Image mean
im_std (float/None) – Image std
- Return np.array norm_img:
z score normalized image
micro_dl.utils.preprocess_utils module
- micro_dl.utils.preprocess_utils.get_preprocess_config(data_dir)
- micro_dl.utils.preprocess_utils.validate_mask_meta(mask_dir, input_dir, csv_name=None, mask_channel=None)
If user provides existing masks, the mask directory should also contain a csv file (not named frames_meta.csv which is reserved for output) with two column names: mask_name and file_name. Each row should describe the mask name and the corresponding file name. Each file_name should exist in input_dir and belong to the same channel. This function checks that all file names exist in input_dir and writes a frames_meta csv containing mask names with indices corresponding to the matched file_name. It also assigns a mask channel number for future preprocessing steps like tiling.
- Parameters:
mask_dir (str) – Mask directory
input_dir (str) – Input image directory, to match masks with images
mask_channel (int/None) – Channel idx assigned to masks
- Return int mask_channel:
New channel index for masks for writing tiles
- Raises:
IOError: If no csv file is present in mask_dir
- Raises:
IOError: If more than one csv file exists in mask_dir and no csv_name is provided to resolve ambiguity
- Raises:
AssertionError: If csv doesn’t consist of two columns named ‘mask_name’ and ‘file_name’
- Raises:
IndexError: If unable to match file_name in mask_dir csv with file_name in input_dir for any given mask row
micro_dl.utils.tile_utils module
- micro_dl.utils.tile_utils.crop_at_indices(input_image, crop_indices, save_dict=None, tile_3d=False)
Crop image into tiles at given indices.
- Parameters:
input_image (np.array) – input image for cropping
crop_indices (list/tuple) – list of indices for cropping
save_dict (dict/None) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname
tile_3d (bool) – boolean flag for adding slice_start_idx to meta
- Returns:
if not saving tiles: a list with tuples of cropped image id of the format rrmin-rmax_ccmin-cmax_slslmin-slmax and cropped image. Else saves tiles in-place and returns a df with tile metadata
- micro_dl.utils.tile_utils.tile_image(input_image, tile_size, step_size, return_index=False, min_fraction=None, save_dict=None)
Tiles the image based on given tile and step size. USE MIN_FRACTION WITH INPUT_IMAGE.DTYPE=bool / MASKS
- Parameters:
input_image (np.array) – 3D input image to be tiled
tile_size (list/tuple/np array) – size of the blocks to be tiled from the image
step_size (list/tuple/np array) – size of the window shift. In case of no overlap, the step size is tile_size. If overlap, step_size < tile_size
return_index (bool) – indicator for returning tile indices
min_fraction (float) – Minimum fraction of foreground in mask for including tile
save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname
- Returns:
if not saving: a list with tuples of tiled image id of the format rrmin-rmax_ccmin-cmax_slslmin-slmax and tiled image Else: save tiles in-place and return a df with tile metadata if return_index=True: return a list with tuples of crop indices
- micro_dl.utils.tile_utils.write_meta(tiled_metadata, save_dict)
Write meta for tiles from an image as a csv
- Parameters:
tiled_metadata (list) – list of meta dicts
save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname
- Returns:
- micro_dl.utils.tile_utils.write_tile(tile, file_name, save_dict)
Write tile function that can be called using threading.
- Parameters:
tile (np.array) – one tile
file_name (str) – File name for tile (must be .npy format)
save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx,
- Return str op_fname:
filename used for saving the tile with entire path
micro_dl.utils.train_utils module
Utility functions used for training
- micro_dl.utils.train_utils.check_gpu_availability(gpu_id)
Check if mem_frac is available in given gpu_id
- Parameters:
gpu_id (int/list) – id of the gpu to be used. Int for single GPU training, list for distributed training
gpu_mem_frac (list) – mem fraction for each GPU in gpu_id
- Return bool gpu_availability:
True if all mem_fracs are greater than gpu_mem_frac
- Return list curr_mem_frac:
list of current memory fractions available for gpus in to gpu_id
- micro_dl.utils.train_utils.get_loss(loss_str)
Get loss type from config
- micro_dl.utils.train_utils.get_metrics(metrics_list)
Get the metrics from config
- micro_dl.utils.train_utils.select_gpu(gpu_ids=None, gpu_mem_frac=None)
Find the GPU ID with highest available memory fraction. If ID is given as input, set the gpu_mem_frac to maximum available, or if a memory fraction is given, make sure the given GPU has the desired memory fraction available. Currently only supports single GPU runs.
- Parameters:
gpu_ids (int) – Desired GPU ID. If None, find GPU with the most memory available.
gpu_mem_frac (float) – Desired GPU memory fraction [0, 1]. If None, use maximum available amount of GPU.
- Return int gpu_ids:
GPU ID to use.
- Return float cur_mem_frac:
GPU memory fraction to use
- Raises:
NotImplementedError: If gpu_ids is not int
- Raises:
AssertionError: If requested memory fraction isn’t available
- micro_dl.utils.train_utils.set_keras_session(gpu_ids, gpu_mem_frac)
Set the Keras session
Module contents
Module for utility functions