micro_dl.utils module

Submodules

micro_dl.utils.aux_utils module

micro_dl.utils.aux_utils.adjust_slice_margins(slice_ids, depth)

Adjusts slice indices to given z depth by removing indices too close to boundaries. Assumes that slice indices are contiguous.

Parameters:
  • slice_ids (list of ints) – Slice (z) indices

  • depth (int) – Number of z slices

Return list slice_ids:

Slice indices with adjusted margins

Raises:

AssertionError: if depth is even

Raises:

AssertionError: if there aren’t enough slice ids for given depth

Raises:

AssertionError: if slices aren’t contiguous

micro_dl.utils.aux_utils.convert_channel_names_to_ids(channel_map, channel_list)

Assuming you have a dict from get_channels and a list of channel names, you get a list of channel indices.

Parameters:
  • channel_map (dict) – Channel names with indices

  • channel_list (list) – List of channel names (subset of channel_map) if containing ints, return as is.

Return list channel_ids:

List of (int) channel indices

Raises:

AssertionError – if any channel in list is not in channel_map

micro_dl.utils.aux_utils.get_channel_axis(data_format)

Get the channel axis given the data format

Parameters:

data_format (str) – as named. [channels_last, channel_first]

:return int channel_axis

micro_dl.utils.aux_utils.get_channels(frames_meta)

Load frames metadata from directory, find channel names and their corresponding indices.

Parameters:

frames_meta (pd.DataFrame) – Metadata for frames

Return dict channel_map:

Channel name and corresponding index

Raises:

AssertionError – if channel name column is incompletely populated

micro_dl.utils.aux_utils.get_im_name(time_idx=None, channel_idx=None, slice_idx=None, pos_idx=None, extra_field=None, ext='.png', int2str_len=3)

Create an image name given parameters and extension

Parameters:
  • time_idx (int) – Time index

  • channel_idx (int) – Channel index

  • slice_idx (int) – Slice (z) index

  • pos_idx (int) – Position (FOV) index

  • extra_field (str) – Any extra string you want to include in the name

  • ext (str) – Extension, e.g. ‘.png’ or ‘.npy’

  • int2str_len (int) – Length of string of the converted integers

Return st im_name:

Image file name

micro_dl.utils.aux_utils.get_meta_idx(frames_metadata, time_idx, channel_idx, slice_idx, pos_idx)

Get row index in metadata dataframe given variable indices

Parameters:
  • frames_metadata (dataframe) – Dataframe with column names given below

  • time_idx (int) – Timepoint index

  • channel_idx (int) – Channel index

  • slice_idx (int) – Slice (z) index

  • pos_idx (int) – Position (FOV) index

Returns:

int pos_idx: Row position matching indices above

micro_dl.utils.aux_utils.get_row_idx(frames_metadata, time_idx, channel_idx, slice_idx=-1, pos_idx=-1, dir_names=None)

Get the indices for images with timepoint_idx and channel_idx

Parameters:
  • frames_metadata (pd.DataFrame) – DF with columns time_idx, channel_idx, slice_idx, file_name]

  • time_idx (int) – get info for this timepoint

  • channel_idx (int) – get info for this channel

  • slice_idx (int) – get info for this focal plane (2D)

  • pos_idx (int) – Specify FOV (default to all if -1)

  • dir_names (str) – Directory names if not in dataframe?

Return row_idx:

Row index in dataframe

micro_dl.utils.aux_utils.get_sms_im_name(time_idx=None, channel_name=nan, slice_idx=None, pos_idx=None, extra_field=None, ext='.tiff', int2str_len=3)

Create an image name given parameters and extension This function is custom for the computational microscopy (SMS) group, who has the following file naming convention: File naming convention is assumed to be: img_channelname_t***_p***_z***_extrafield.tif This function will alter list and dict in place.

Parameters:
  • time_idx (int) – Time index

  • channel_name (str/NaN) – Channel name

  • slice_idx (int) – Slice (z) index

  • pos_idx (int) – Position (FOV) index

  • extra_field (str) – Any extra string you want to include in the name

  • ext (str) – Extension starting with period, default ‘.tiff’

  • int2str_len (int) – Length of string of the converted integers

Return str im_name:

Image file name

micro_dl.utils.aux_utils.get_sorted_names(dir_name)

Get image names in directory and sort them by their indices

Parameters:

dir_name (str) – Image directory name

Return list of strs im_names:

Image names sorted according to indices

micro_dl.utils.aux_utils.get_sub_meta(frames_metadata, time_ids, channel_ids, slice_ids, pos_ids)

Get sliced metadata dataframe given variable indices

Parameters:
  • frames_metadata (dataframe) – Dataframe with column names given below

  • time_ids (int/list) – Timepoint indices

  • channel_ids (int/list) – Channel indices

  • slice_ids (int/list) – Slize (z) indices

  • pos_ids (int/list) – Position (FOV) indices

Returns:

int pos_ids: Row positions matching indices above

micro_dl.utils.aux_utils.import_object(module_name, obj_name, obj_type='class')

Imports a class or function dynamically

Parameters:
  • module_name (str) – modules such as input, utils, train etc

  • obj_name (str) – Object to find

  • obj_type (str) – Object type (class or function)

micro_dl.utils.aux_utils.init_logger(logger_name, log_fname, log_level)

Creates a logger instance

Parameters:
  • logger_name (str) – name of the logger instance

  • log_fname (str) – fname with full path of the log file

  • log_level (int) – specifies the logging level: NOTSET:0, DEBUG:10,

INFO:20, WARNING:30, ERROR:40, CRITICAL:50

micro_dl.utils.aux_utils.make_dataframe(nbr_rows=None, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'])

Create empty frames metadata pandas dataframe given number of rows and standard column names defined below

Parameters:
  • nbr_rows ([None, int]) – The number of rows in the dataframe

  • df_names (list) – Dataframe column names

Return dataframe frames_meta:

Empty dataframe with given indices and column names

micro_dl.utils.aux_utils.parse_idx_from_name(im_name, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'], dir_name=None, order='cztp')

Assumes im_name is e.g. im_c***_z***_p***_t***.png, It doesn’t care about the extension or the number of digits each index is represented by, it extracts all integers from the image file name and assigns them by order. By default it assumes that the order is c, z, t, p. :param str im_name: Image name without path :param list of strs df_names: Dataframe col names :param str dir_name: Directory path :param str order: Order in which c, z, t, p are given in the image (4 chars) :return dict meta_row: One row of metadata given image file name

micro_dl.utils.aux_utils.parse_sms_name(im_name, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'], dir_name=None, channel_names=[])

Parse metadata from file name or file path. This function is custom for the computational microscopy (SMS) group, who has the following file naming convention: File naming convention is assumed to be: img_channelname_t***_p***_z***.tif This function will alter list and dict in place.

Parameters:
  • im_name (str) – File name or path

  • df_names (list of strs) – Dataframe col names

  • dir_name (str) – Directory path

  • channel_names (list[str]) – Expanding list of channel names

Return dict meta_row:

One row of metadata given image file name

micro_dl.utils.aux_utils.read_config(config_fname)

Read the config file in yml format. TODO: validate config!

Parameters:

config_fname (str) – fname of config yaml with its full path

Returns:

dict config: Configuration parameters

micro_dl.utils.aux_utils.read_json(json_filename)

Read JSON file and validate schema

Parameters:

json_filename (str) – json file name

Returns:

dict json_object: JSON object

Raises:

FileNotFoundError: if file can’t be read

Raises:

JSONDecodeError: if file is not in json format

micro_dl.utils.aux_utils.read_meta(input_dir, meta_fname='frames_meta.csv')

Read metadata file, which is assumed to be named ‘frames_meta.csv’ in given directory.

Parameters:
  • input_dir (str) – Directory containing data and metadata

  • meta_fname (str) – Metadata file name

Return dataframe frames_metadata:

Metadata for all frames

Raises:

IOError: If metadata file isn’t present

micro_dl.utils.aux_utils.save_tile_meta(tiles_meta, cur_channel, tiled_dir)

Save meta data for tiled images

Parameters:
  • tiles_meta (list) – List of tuples holding meta info for tiled images

  • cur_channel (int) – Channel being tiled

  • tiled_dir (str) – Directory to save meta data in

micro_dl.utils.aux_utils.sort_meta_by_channel(frames_metadata)

Rearrange metadata dataframe from all channels being listed in the same column to moving file names for each channel to separate columns.

Parameters:

frames_metadata (dataframe) – Metadata with one column named ‘file_name’

Return dataframe sorted_metadata:

Metadata with separate file_name_X for channel X.

micro_dl.utils.aux_utils.validate_config(config_dict, params)

Check if the required params are present in config

Parameters:
  • config_dict (dict) – dictionary with params as keys

  • params (list) – list of strings with expected params

Returns:

list with bool values indicating if param is present or not

micro_dl.utils.aux_utils.validate_indices(frames_meta, preprocess_config, idx_type)

Helper function to check if a list of position, time or slice indices in the preprocessing config exist in the frames metadata. If not, use all indices in metadata.

Parameters:
  • frames_meta (pd.DataFrame) – Metadata for all images

  • preprocess_config (dict) – Preprocessing config

  • idx_type (str) – Type of index: ‘pos’, ‘time’, ‘slice’

Return list use_ids:

Indices to be used in preprocessing

Raises:

AssertionError – If indices in preprocess config is not a subset of those found in frames metadata

micro_dl.utils.aux_utils.validate_metadata_indices(frames_metadata, time_ids=None, channel_ids=None, slice_ids=None, pos_ids=None, uniform_structure=True)

Check the availability of indices provided timepoints, channels, positions and slices for all data. If input ids are None, the indices for that parameter will not be evaluated. If input ids are -1, all indices for that parameter will be returned.

Parameters:
  • frames_metadata (pd.DataFrame) – DF with columns time_idx, channel_idx, slice_idx, pos_idx, file_name]

  • time_ids (int/list) – check availability of these timepoints in frames_metadata

  • channel_ids (int/list) – check availability of these channels in frames_metadata

  • pos_ids (int/list) – Check availability of positions in metadata

  • slice_ids (int/list) – Check availability of z slices in metadata

  • uniform_structure (bool) – bool indicator if unequal quantities in any of the ids (channel, time, slice, pos)

Return dict metadata_ids:

All indices found given input

Raises:

AssertionError: If not all channels, timepoints, positions or slices are present

micro_dl.utils.aux_utils.write_json(json_dict, json_filename)

Writes dict as json file.

Parameters:
  • json_dict (dict) – Dictionary to be written

  • json_filename (str) – Full path file name of json

micro_dl.utils.image_utils module

Utility functions for processing images

micro_dl.utils.image_utils.apply_flat_field_correction(input_image, **kwargs)

Apply flat field correction.

Parameters:
  • input_image (np.array) – image to be corrected

  • **kwargs – See below

Returns:

np.array (float) corrected image

Keyword arguments:
  • flat_field_image (np.float) – flat_field_image for correction OR

  • flat_field_path (str) – Full path to flatfield image

micro_dl.utils.image_utils.center_crop_to_shape(input_image, output_shape, image_format='zyx')

Center crop the image to a given shape

Parameters:
  • input_image (np.array) – input image to be cropped

  • output_shape (list) – desired crop shape

  • image_format (str) – Image format; zyx or xyz

Return np.array center_block:

Center of input image with output shape

micro_dl.utils.image_utils.crop2base(im, base=2)

Crop image to nearest smaller factor of the base (usually 2), assumes xyz format, will work for zyx too but the x_shape, y_shape and z_shape will be z_shape, y_shape and x_shape respectively

Parameters:
  • im (nd.array) – Image

  • base (int) – Base to use, typically 2

  • crop_z (bool) – crop along z dim, only for UNet3D

Return nd.array im:

Cropped image

Raises:

AssertionError: if base is less than zero

micro_dl.utils.image_utils.fit_polynomial_surface_2D(sample_coords, sample_values, im_shape, order=2, normalize=True)

Given coordinates and corresponding values, this function will fit a 2D polynomial of given order, then create a surface of given shape.

Parameters:
  • sample_coords (np.array) – 2D sample coords (nbr of points, 2)

  • sample_values (np.array) – Corresponding intensity values (nbr points,)

  • im_shape (tuple) – Shape of desired output surface (height, width)

  • order (int) – Order of polynomial (default 2)

  • normalize (bool) – Normalize surface by dividing by its mean for flatfield correction (default True)

Return np.array poly_surface:

2D surface of shape im_shape

micro_dl.utils.image_utils.get_flat_field_path(flat_field_dir, channel_idx, channel_ids)

Given channel and flatfield dir, check that corresponding flatfield is present and returns its path.

Parameters:
  • flat_field_dir (str) – Flatfield directory

  • channel_idx (int) – Channel index for flatfield

  • channel_ids (list) – All channel indices being processed

micro_dl.utils.image_utils.grid_sample_pixel_values(im, grid_spacing)

Sample pixel values in the input image at the grid. Any incomplete grids (remainders of modulus operation) will be ignored.

Parameters:
  • im (np.array) – 2D image

  • grid_spacing (int) – spacing of the grid

Return int row_ids:

row indices of the grids

Return int col_ids:

column indices of the grids

Return np.array sample_values:

sampled pixel values

micro_dl.utils.image_utils.im_adjust(img, tol=1, bit=8)

Adjust contrast of the image

micro_dl.utils.image_utils.im_bit_convert(im, bit=16, norm=False, limit=[])
micro_dl.utils.image_utils.preprocess_image(im, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Do histogram clipping, z score normalization, and potentially binarization.

Parameters:
  • im (np.array) – Image (stack)

  • hist_clip_limits (tuple) – Percentile histogram clipping limits

  • is_mask (bool) – True if mask

  • normalize_im (str/None) – Normalization, if any

  • zscore_mean (float/None) – Data mean

  • zscore_std (float/None) – Data std

micro_dl.utils.image_utils.preprocess_imstack(frames_metadata, depth, time_idx, channel_idx, slice_idx, pos_idx, dir_name=None, flat_field_path=None, hist_clip_limits=None, normalize_im='stack')

Preprocess image given by indices: flatfield correction, histogram clipping and z-score normalization is performed.

Parameters:
  • frames_metadata (pd.DataFrame) – DF with meta info for all images

  • depth (int) – num of slices in stack if 2.5D or depth for 3D

  • time_idx (int) – Time index

  • channel_idx (int) – Channel index

  • slice_idx (int) – Slice (z) index

  • pos_idx (int) – Position (FOV) index

  • dir_name (str/None) – Image directory (none if using the frames_meta dir_name)

  • flat_field_path (np.array) – Path to flat field image for channel

  • hist_clip_limits (list) – Limits for histogram clipping (size 2)

  • normalize_im (str or None) – options to z-score the image

Return np.array im:

3D preprocessed image

micro_dl.utils.image_utils.read_image(file_path)

Read 2D grayscale image from file. Checks file extension for npy and load array if true. Otherwise reads regular image using OpenCV (png, tif, jpg, see OpenCV for supported files) of any bit depth.

Parameters:

file_path (str) – Full path to image

Return array im:

2D image

Raises:

IOError: if image can’t be opened

micro_dl.utils.image_utils.read_image_from_row(meta_row, dir_name=None)

Read 2D grayscale image from file. Checks file extension for npy and load array if true. Otherwise reads regular image using OpenCV (png, tif, jpg, see OpenCV for supported files) of any bit depth.

Parameters:
  • meta_row (pd.DataFrame) – Row in metadata

  • dir_name (str/None) – Directory containing images (none if using frames meta dir_name)

Return array im:

2D image

:raise IOError if image can’t be opened

micro_dl.utils.image_utils.read_imstack(input_fnames, flat_field_fnames=None, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Read the images in the fnames and assembles a stack. If images are masks, make sure they’re boolean by setting >0 to True

Parameters:
  • input_fnames (tuple/list) – Paths to input files

  • flat_field_fnames (str/list) – Path(s) to flat field image(s)

  • hist_clip_limits (tuple) – limits for histogram clipping

  • is_mask (bool) – Indicator for if files contain masks

  • normalize_im (bool/None) – Whether to zscore normalize im stack

  • zscore_mean (float) – mean for z-scoring the image

  • zscore_std (float) – std for z-scoring the image

Return np.array:

input stack flat_field correct and z-scored if regular images, booleans if they’re masks

micro_dl.utils.image_utils.read_imstack_from_meta(frames_meta_sub, dir_name=None, flat_field_fnames=None, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Read images (>1) from metadata rows and assembles a stack. If images are masks, make sure they’re boolean by setting >0 to True

Parameters:
  • frames_meta_sub (pd.DataFrame) – Selected subvolume to be read

  • dir_name (str/None) – Directory path (none if using dir in frames_meta)

  • flat_field_fnames (str/list) – Path(s) to flat field image(s)

  • hist_clip_limits (tuple) – Percentile limits for histogram clipping

  • is_mask (bool) – Indicator for if files contain masks

  • normalize_im (bool/None) – Whether to zscore normalize im stack

  • zscore_mean (float) – mean for z-scoring the image

  • zscore_std (float) – std for z-scoring the image

Return np.array:

input stack flat_field correct and z-scored if regular images, booleans if they’re masks

micro_dl.utils.image_utils.rescale_image(im, scale_factor)

Rescales a 2D image equally in x and y given a scale factor. Uses bilinear interpolation (the OpenCV default).

Parameters:
  • im (np.array) – 2D image

  • scale_factor (float) –

Return np.array:

2D image resized by scale factor

micro_dl.utils.image_utils.rescale_nd_image(input_volume, scale_factor)

Rescale a nd array, mainly used for 3D volume

For non-int dims, the values are rounded off to closest int. 0.5 is iffy, when downsampling the value gets floored and upsampling it gets rounded to next int

Parameters:
  • input_volume (np.array) – 3D stack

  • scale_factor (float/list) – if scale_factor is a float, scale all dimensions by this. Else scale_factor has to be specified for each dimension in a list or tuple

Return np.array res_volume:

rescaled volume

micro_dl.utils.image_utils.resize_image(input_image, output_shape)

Resize image to a specified shape

Parameters:
  • input_image (np.ndarray) – image to be resized

  • output_shape (tuple/np.array) – desired shape of the output image

Returns:

np.array, resized image

micro_dl.utils.image_utils.resize_mask(input_image, target_size)

Resample label/bool images

micro_dl.utils.io_utils module

class micro_dl.utils.io_utils.DefaultZarr(store, root_path)

Bases: WriterBase

This writer is based off creating a default HCS hierarchy for non-hcs datasets. Currently, we decide that all positions will live under individual columns under a single row. i.e. this produces the following structure: Dataset.zarr

____> Row_0
—> Col_0

—> Pos_000

… –> Col_N

—> Pos_N

We assume this structure in the metadata updating/position creation

create_position(position, name)

Creates a column and position subgroup given the index and name. Name is provided by the main writer class

Parameters:
  • position (int) – Index of the position to create

  • name (str) – Name of the position subgroup

init_hierarchy()

method to init the default hierarchy. Will create the first row and initialize metadata fields

class micro_dl.utils.io_utils.ReaderBase

Bases: object

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

This will be updated if the io parts of waveorder is moved to a stand alone python package.

get_array(position: int) ndarray
get_image(p, t, c, z) ndarray
get_num_positions() int
get_zarr(position: int) array
property shape
class micro_dl.utils.io_utils.WriterBase(store, root_path)

Bases: object

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

This will be updated if the io of waveorder is moved to a stand alone python package. ABC for all writer types

create_channel_dict(chan_name, clim=None, first_chan=False)

This will create a dictionary used for OME-zarr metadata. Allows custom contrast limits and channel names for display. Defaults everything to grayscale. Parameters ———- chan_name: (str) Desired name of the channel for display clim: (tuple) contrast limits (start, end, min, max) first_chan: (bool) whether or not this is the first channel of the dataset (display will be set to active) Returns ——- dict_: (dict) dictionary adherent to ome-zarr standards

create_column(row_idx, idx, name=None)

Creates a column in the hierarchy (second level below zarr store, one below row). Option to name this column. Default is Col_{idx}. Keeps track of the column name + column index for later metadata creation Parameters ———- row_idx: (int) Index of the row to place the column underneath idx: (int) Index of the column (order in which it is placed) name: (str) Optional name to replace default column name Returns ——-

create_position(position: int, name: str)
create_row(idx, name=None)

Creates a row in the hierarchy (first level below zarr store). Option to name this row. Default is Row_{idx}. Keeps track of the row name + row index for later metadata creation Parameters ———- idx: (int) Index of the row (order in which it is placed) name: (str) Optional name to replace default row name Returns ——-

get_zarr()
init_array(data_shape, chunk_size, dtype, chan_names, clims, overwrite=False)

Initializes the zarr array under the current position subgroup. array level is called ‘arr_0’ in the hierarchy. Sets omero/multiscales metadata based upon chan_names and clims Parameters ———- data_shape: (tuple) Desired Shape of your data (T, C, Z, Y, X). Must match data chunk_size: (tuple) Desired Chunk Size (T, C, Z, Y, X). Chunking each image would be (1, 1, 1, Y, X) dtype: (str or np.dtype) Data Type, i.e. ‘uint16’ or np.uint16 chan_names: (list) List of strings corresponding to your channel names. Used for OME-zarr metadata clims: (list) list of tuples corresponding to contrast limtis for channel. OME-Zarr metadata

tuple can be of (start, end, min, max) or (start, end)

overwrite: (bool) Whether or not to overwrite the existing data that may be present. Returns ——-

init_hierarchy()
open_position(position: int)

Opens a position based upon the position index. It will navigate the rows/column to find where this position is based off of the generation position map which keeps track of this information. It will set current_pos_group to this position for writing the data Parameters ———- position: (int) Index of the position you wish to open Returns ——-

set_channel_attributes(chan_names, clims=None)

A method for creating ome-zarr metadata dictionary. Channel names are defined by the user, everything else is pre-defined. Parameters ———- chan_names: (list) List of channel names in the order of the channel dimensions

i.e. if 3D Phase is C = 0, list ‘3DPhase’ first.

clims: (list of tuples) contrast limits to display for every channel

set_root(root)

set the root path of the zarr store. Used in the main writer class. Parameters ———- root: (str) path to the zarr store (folder ending in .zarr) Returns ——-

set_store(store)

Sets the zarr store. Used in the main writer class Parameters ———- store: (Zarr StoreObject) Opened zarr store at the highest level Returns ——-

set_verbosity(verbose: bool)
write(data, t, c, z)

Write data to specified index of initialized zarr array :param data: (nd-array), data to be saved. Must be the shape that matches indices (T, C, Z, Y, X) :param t: (list), index or index slice of the time dimension :param c: (list), index or index slice of the channel dimension :param z: (list), index or index slice of the z dimension

class micro_dl.utils.io_utils.ZarrReader(zarrfile: str)

Bases: ReaderBase

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

Reader for HCS ome-zarr arrays. OME-zarr structure can be found here: https://ngff.openmicroscopy.org/0.1/ Also collects the HCS metadata so it can be later copied.

get_array(position)

Gets the (T, C, Z, Y, X) array at given position

Parameters:

position (int) – Position index

Return np.array pos:

Array of size (T, C, Z, Y, X) at specified position

get_image(p, t, c, z)

Returns the image at dimension P, T, C, Z

Parameters:
  • p (int) – Index of the position dimension

  • t (int) – Index of the time dimension

  • c (int) – Index of the channel dimension

  • z (int) – Index of the z dimension

Return np.array image:

Image at the given dimension of shape (Y, X)

get_image_plane_metadata(p, c, z)

For the sake of not keeping an enormous amount of metadata, only the microscope conditions for the first timepoint are kept in the zarr metadata during write. User can only query image

plane metadata at p, c, z

Parameters:
  • p (int) – Position index

  • c (int) – Channel index

  • z (int) – Z-slice index

Return dict metadata:

Image Plane Metadata at given coordinate w/ T = 0

get_num_positions() int
get_zarr(position)

Returns the position-level zarr group array (not in memory)

Parameters:

position (int) – Position index

:return ZarrArray Zarr array containing the (T, C, Z, Y, X) array at given position

class micro_dl.utils.io_utils.ZarrWriter(save_dir: Optional[str] = None, hcs_meta: Optional[dict] = None, verbose: bool = False)

Bases: object

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

given stokes or physical data, construct a standard hierarchy in zarr for output

should conform to the ome-zarr standard as much as possible

TODO: Allow for writing multiple positions in same store

create_zarr_root(name)

Method for creating the root zarr store. If the store already exists, it will raise an error. Name corresponds to the root directory name (highest level) zarr store.

Parameters:

name (str) – Name of the zarr store.

current_group_name = None
current_position = None
init_array(position, data_shape, chunk_size, chan_names, dtype='float32', clims=None, position_name=None, overwrite=False)

Creates a subgroup structure based on position index. Then initializes the zarr array under the current position subgroup. Array level is called ‘array’ in the hierarchy.

Parameters:
  • position (int) – Position index upon which to initialize array

  • data_shape (tuple) – Desired Shape of your data (T, C, Z, Y, X). Must match data

  • chunk_size (tuple) – Desired Chunk Size (T, C, Z, Y, X). Chunking each image would be (1, 1, 1, Y, X)

  • dtype (str) – Data Type, i.e. ‘uint16’

  • clims (list) – List of tuples corresponding to contrast limtis for channel. OME-Zarr metadata

  • overwrite (bool) – Whether or not to overwrite the existing data that may be present.

Parm list chan_names:

List of strings corresponding to your channel names. Used for OME-zarr metadata

store = None
write(data, p, t=None, c=None, z=None)

Wrapper that calls the builder’s write function. Will write to existing array of zeros and place data over the specified indicies.

Parameters:
  • data (np.array) – Data to be saved. Must be the shape that matches indices (T, C, Z, Y, X)

  • p (int) – Position index in which to write the data into

  • t (int/slice) – Time index or index range of the time dimension

  • c (int/slice) – Channel index or index range of the channel dimension

  • z (int/slice) – Slice index or index range of the Z-slice dimension

micro_dl.utils.masks module

micro_dl.utils.masks.create_otsu_mask(input_image, str_elem_size=3, thr=None, kernel_size=3, w_shed=False)

Create a binary mask using morphological operations Opening removes small objects in the foreground.

Parameters:
  • input_image (np.array) – generate masks from this image

  • str_elem_size (int) – size of the structuring element. typically 3, 5

  • thr (float) – Threshold

  • kernel_size (int) – Kernel size

  • w_shed (bool) – Whether to use watershed

Returns:

mask of input_image, np.array

micro_dl.utils.masks.create_unimodal_mask(input_image, str_elem_size=3, kernel_size=3)

Create a mask with unimodal thresholding and morphological operations. Unimodal thresholding seems to oversegment, erode it by a fraction

Parameters:
  • input_image (np.array) – generate masks from this image

  • str_elem_size (int) – size of the structuring element. typically 3, 5

:return mask of input_image, np.array

micro_dl.utils.masks.get_unet_border_weight_map(annotation, w0=10, sigma=5)

Return weight map for borders as specified in UNet paper. Note: The below method only works for UNet Segmentation only. TODO: Calculate boundaries directly and calculate distance from boundary of cells to another.

:param annotation A 2D array of shape (image_height, image_width) contains annotation with each class labeled as an integer. :param w0 multiplier to the exponential distance loss default 10 as mentioned in UNet paper :param sigma standard deviation in the exponential distance term e^(-d1 + d2) ** 2 / 2 (sigma ^ 2) default 5 as mentioned in UNet paper :return weight map for borders as specified in UNet

micro_dl.utils.masks.get_unimodal_threshold(input_image)

Determines optimal unimodal threshold

https://users.cs.cf.ac.uk/Paul.Rosin/resources/papers/unimodal2.pdf https://www.mathworks.com/matlabcentral/fileexchange/45443-rosin-thresholding

Parameters:

input_image (np.array) – generate mask for this image

Return float best_threshold:

optimal lower threshold for the foreground hist

micro_dl.utils.meta_utils module

micro_dl.utils.meta_utils.compute_zscore_params(frames_meta, ints_meta, input_dir, normalize_im, min_fraction=0.99)

Compute median and interquartile range of intensities in blocks/tiles determined ints_meta_generator function (saved in intensity_meta.csv). Masks need to bee computed and only tiles with enough foreground given masks (determined by min_fraction) will be included in the analysis.

Parameters:
  • frames_meta (pd.DataFrame) – Dataframe containing all metadata

  • ints_meta (pd.DataFrame) – Metadata containing intensity statistics each z-slice and foreground fraction for masks

  • input_dir (str) – Directory containing images

  • normalize_im (None/str) – normalization scheme for input images

  • min_fraction (float) – Minimum foreground fraction of masks for computing intensity statistics.

Return pd.DataFrame frames_meta:

DataFrame containing all metadata

Return pd.DataFrame ints_meta:

Metadata containing intensity statistics each z-slice

micro_dl.utils.meta_utils.frames_meta_from_filenames(input_dir, name_parser)

Extracts metadata (channel, position, time, slice) from file name.

Parameters:
  • input_dir (str) – path to input directory containing images

  • name_parser (str) – Function in aux_utils for parsing indices from file name

Return pd.DataFrame frames_meta:

Metadata for all frames in dataset

micro_dl.utils.meta_utils.frames_meta_from_zarr(input_dir, file_names)

Reads ome-zarr file and creates frames_meta based on metadata and array information. Assumes one zarr store per position according to OME guidelines.

Parameters:
  • input_dir (str) – Input directory

  • file_names (list) – List of full paths to all zarr files in dir

Return pd.DataFrame frames_meta:

Metadata for all frames in zarr

micro_dl.utils.meta_utils.frames_meta_generator(input_dir, file_format='zarr', name_parser='parse_sms_name')

Generate metadata from file names, or metadata in the case of zarr files, for preprocessing. Will write found data in frames_metadata.csv in input directory.

Naming convention for default parser ‘parse_sms_name’: img_channelname_t***_p***_z***.tif for parse_sms_name

The file structure for ome-zarr files is described here: https://ngff.openmicroscopy.org/0.1/

Parameters:
  • input_dir (str) – path to input directory containing image data

  • file_format (str) – Image file format (‘zarr’ or ‘tiff’ or ‘png’)

  • name_parser (str) – Function in aux_utils for parsing indices from tiff/png file name

Return pd.DataFrame frames_meta:

Metadata for all frames in dataset

micro_dl.utils.meta_utils.ints_meta_generator(input_dir, channel_ids, num_workers=4, block_size=256, flat_field_dir=None)

Generate pixel intensity metadata for estimating image normalization parameters during preprocessing step. Pixels are sub-sampled from the image following a grid pattern defined by block_size to for efficient estimation of median and interquartile range. Grid sampling is preferred over random sampling in the case due to the spatial correlation in images. Will write found data in ints_meta.csv in input directory. Assumed default naming convention for tiff files is: img_channelname_t***_p***_z***.tif for parse_sms_name

Parameters:
  • input_dir (str) – path to input directory containing images

  • channel_ids (list) – Channel indices to process

  • num_workers (int) – number of workers for multiprocessing

  • block_size (int) – block size for the grid sampling pattern. Default value works well for 2048 X 2048 images.

  • flat_field_dir (str) – Directory containing flatfield images

micro_dl.utils.meta_utils.mask_meta_generator(input_dir, num_workers=4)

Generate pixel intensity metadata for estimating image normalization parameters during preprocessing step. Pixels are sub-sampled from the image following a grid pattern defined by block_size to for efficient estimation of median and interquatile range. Grid sampling is preferred over random sampling in the case due to the spatial correlation in images. Will write found data in intensity_meta.csv in input directory. Assumed default file naming convention is:

img_channelname_t***_p***_z***.tif for parse_sms_name

Parameters:
  • input_dir (str) – path to input directory containing images

  • order (str) – Order in which file name encodes cztp

  • name_parser (str) – Function in aux_utils for parsing indices from file name

  • num_workers (int) – number of workers for multiprocessing

Return pd.DataFrame mask_meta:

Metadata with mask info

micro_dl.utils.mp_utils module

micro_dl.utils.mp_utils.create_save_mask(channels_meta_sub, flat_field_fnames, str_elem_radius, mask_dir, mask_channel_idx, int2str_len, mask_type, mask_ext, dir_name=None, channel_thrs=None)

Create and save mask. When more than one channel are used to generate the mask, mask of each channel is generated then added together.

Parameters:
  • channels_meta_sub (pd.DataFrame) – Metadata for given PTCZ

  • flat_field_fnames (list/None) – Paths to corresponding flat field images

  • str_elem_radius (int) – size of structuring element used for binary opening. str_elem: disk or ball

  • mask_dir (str) – dir to save masks

  • mask_channel_idx (int) – channel number of mask

  • time_idx (int) – time points to use for generating mask

  • pos_idx (int) – generate masks for given position / sample ids

  • slice_idx (int) – generate masks for given slice ids

  • int2str_len (int) – Length of str when converting ints

  • mask_type (str) – thresholding type used for masking or str to map to masking function

  • mask_ext (str) – ‘.npy’ or ‘.png’. Save the mask as uint8 PNG or NPY files for otsu, unimodal masks, recommended to save as npy float64 for borders_weight_loss_map masks to avoid loss due to scaling it to uint8.

  • dir_name (str/None) – Image directory (none if using frames_meta dir_name)

  • channel_thrs (list) – list of threshold for each channel to generate binary masks. Only used when mask_type is ‘dataset_otsu’

Return dict cur_meta:

For each mask, fg_frac is added to metadata

micro_dl.utils.mp_utils.crop_at_indices_save(meta_sub, flat_field_fname, hist_clip_limits, slice_idx, crop_indices, image_format, save_dir, dir_name=None, int2str_len=3, is_mask=False, tile_3d=False, normalize_im=True, zscore_mean=None, zscore_std=None)

Crop image into tiles at given indices and save.

Parameters:
  • meta_sub (pd.DataFrame) – Subset of metadata for images to be cropped

  • flat_field_fname (str) – File nname of flat field image

  • hist_clip_limits (tuple) – limits for histogram clipping

  • time_idx (int) – time point of input image

  • channel_idx (int) – channel idx of input image

  • slice_idx (int) – slice idx of input image

  • pos_idx (int) – sample idx of input image

  • crop_indices (tuple) – tuple of indices for cropping

  • image_format (str) – zyx or xyz

  • save_dir (str) – output dir to save tiles

  • dir_name (str/None) – Input directory

  • int2str_len (int) – len of indices for creating file names

  • is_mask (bool) – Indicates if files are masks

  • tile_3d (bool) – indicator for tiling in 3D

Returns:

pd.DataFrame from a list of dicts with metadata

micro_dl.utils.mp_utils.get_im_stats(im_path)

Read and computes statistics of images.

Parameters:

im_path (str) – Full path to image

Return dict meta_row:

Dict with intensity data for image

micro_dl.utils.mp_utils.get_mask_meta_row(file_path, meta_row)

Given path to mask, read mask, compute foreground fraction and fill in corresponding metadata row.

Parameters:
  • file_path (str) – Path to binary mask image

  • meta_row (pd.DataFrame) – Metadata row to fill in

Return pd.DataFrame meta_row:

Metadata row with foreground fraction for mask

micro_dl.utils.mp_utils.mp_create_save_mask(fn_args, workers)

Create and save masks with multiprocessing

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

Returns:

list of returned dicts from create_save_mask

micro_dl.utils.mp_utils.mp_crop_save(fn_args, workers)

Crop and save images with multiprocessing.

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

Returns:

list of returned df from crop_at_indices_save

micro_dl.utils.mp_utils.mp_get_im_stats(fn_args, workers)

Read and computes statistics of images with multiprocessing.

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

Returns:

list of returned df from get_im_stats

micro_dl.utils.mp_utils.mp_rescale_vol(fn_args, workers)

Rescale and save image stacks with multiprocessing.

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

micro_dl.utils.mp_utils.mp_resize_save(mp_args, workers)

Resize and save images with multiprocessing.

Parameters:
  • mp_args (list) – Function keyword arguments

  • workers (int) – max number of workers

micro_dl.utils.mp_utils.mp_sample_im_pixels(fn_args, workers)

Read and computes statistics of images with multiprocessing.

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

Returns:

list of returned df from get_im_stats

micro_dl.utils.mp_utils.mp_tile_save(fn_args, workers)

Tile and save with multiprocessing https://stackoverflow.com/questions/42074501/python-concurrent-futures-processpoolexecutor-performance-of-submit-vs-map

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

Returns:

list of returned df from tile_and_save

micro_dl.utils.mp_utils.mp_wrapper(fn, fn_args, workers)

Create and save masks with multiprocessing

Parameters:
  • fn_args (list of tuple) – list with tuples of function arguments

  • workers (int) – max number of workers

Returns:

list of returned dicts from create_save_mask

micro_dl.utils.mp_utils.rescale_vol_and_save(time_idx, pos_idx, channel_idx, slice_start_idx, slice_end_idx, frames_metadata, dir_name, output_fname, scale_factor, ff_path)

Rescale volumes and save.

Parameters:
  • time_idx (int) – time point of input image

  • pos_idx (int) – sample idx of input image

  • channel_idx (int) – channel idx of input image

  • slice_start_idx (int) – start slice idx for the vol to be saved

  • slice_end_idx (int) – end slice idx for the vol to be saved

  • frames_metadata (pd.Dataframe) – metadata for the input slices

  • dir_name (str/None) – Image directory (none if using dir_name from frames_meta)

  • output_fname (str) – output_fname

  • scale_factor (float/list) – scale factor for resizing

  • ff_path (str/None) – path to flat field image

micro_dl.utils.mp_utils.resize_and_save(**kwargs)

Resizes images and saving them. Performs flatfield correction prior to resizing if flatfield images are present.

Parameters:

kwargs – Keyword arguments:

str file_path: Path to input image str write_path: Path to image to be written float scale_factor: Scale factor for resizing str ff_path: path to flat field correction image

micro_dl.utils.mp_utils.sample_im_pixels(meta_row, ff_path, grid_spacing, dir_name=None)

Read and computes statistics of images for each point in a grid. Grid spacing determines distance in pixels between grid points for rows and cols. Applies flatfield correction prior to intensity sampling if flatfield path is specified.

Parameters:
  • meta_row (dict) – Metadata row for image

  • ff_path (str) – Full path to flatfield image corresponding to image

  • grid_spacing (int) – Distance in pixels between sampling points

  • dir_name (str/None) – Image directory (none if using dir_name from frames_meta)

Return list meta_rows:

Dicts with intensity data for each grid point

micro_dl.utils.mp_utils.tile_and_save(meta_sub, flat_field_fname, hist_clip_limits, slice_idx, tile_size, step_size, min_fraction, image_format, save_dir, dir_name=None, int2str_len=3, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Crop image into tiles at given indices and save.

Parameters:
  • meta_sub (pd.DataFrame) – Subset of metadata for images to be tiled

  • flat_field_fname (str) – fname of flat field image

  • hist_clip_limits (tuple) – limits for histogram clipping

  • slice_idx (int) – slice idx of input image

  • tile_size (list) – size of tile along row, col (& slices)

  • step_size (list) – step size along row, col (& slices)

  • min_fraction (float) – min foreground volume fraction for keep tile

  • image_format (str) – zyx / xyz

  • save_dir (str) – output dir to save tiles

  • dir_name (str/None) – Image directory

  • int2str_len (int) – len of indices for creating file names

  • is_mask (bool) – Indicates if files are masks

  • normalize_im (str/None) – Normalization method

  • zscore_mean (float/None) – Mean for normalization

  • zscore_std (float/None) – Std for normalization

Returns:

pd.DataFrame from a list of dicts with metadata

micro_dl.utils.network_utils module

micro_dl.utils.network_utils.create_activation_layer(activation_dict)

Get the keras activation / advanced activation

Parameters:

activation_dict (dict) – Nested dict with keys: type -> activation type and params -> dict activation related params such as alpha, theta, alpha_initializer, alpha_regularizer etc from advanced activations

Return keras.layer:

instance of activation layer

micro_dl.utils.network_utils.get_keras_layer(type, num_dims)

Get the 2D or 3D keras layer

Parameters:
  • stype (str) – type of layer [conv, pooling, upsampling]

  • num_dims (int) – dimensionality of the image [2 ,3]

Returns:

keras.layer

micro_dl.utils.network_utils.get_layer_shape(layer_shape, data_format)

Get the layer shape without the batch and channel dimensions

Parameters:
  • layer_shape (list) – output of layer.get_output_shape.as_list()

  • data_format (str) – in [channels_first, channels_last]

Returns:

np.array layer_shape_xyz - layer shape without batch and channel dimensions

micro_dl.utils.normalize module

Image normalization related functions

micro_dl.utils.normalize.hist_adapteq_2D(input_image, kernel_size=None, clip_limit=None)

CLAHE on 2D images

skimage.exposure.equalize_adapthist works only for 2D. Extend to 3D or use openCV? Not ideal, as it enhances noise in homogeneous areas

Parameters:
  • input_image (np.array) – input image for intensity normalization

  • kernel_size (int/list) – Neighbourhood to be used for histogram equalization. If none, use default of 1/8th image size.

  • clip_limit (float) – Clipping limit, normalized between 0 and 1 (higher values give more contrast, ~ max percent of voxels in any histogram bin, if > this limit, the voxel intensities are redistributed). if None, default=0.01

micro_dl.utils.normalize.hist_clipping(input_image, min_percentile=2, max_percentile=98)

Clips and rescales histogram from min to max intensity percentiles

rescale_intensity with input check

Parameters:
  • input_image (np.array) – input image for intensity normalization

  • min_percentile (int/float) – min intensity percentile

  • max_percentile (int/flaot) – max intensity percentile

Returns:

np.float, intensity clipped and rescaled image

micro_dl.utils.normalize.unzscore(im_norm, zscore_median, zscore_iqr)

Revert z-score normalization applied during preprocessing. Necessary before computing SSIM

Parameters:
  • im_norm – Normalized image for un-zscore

  • zscore_median – Image median

  • zscore_iqr – Image interquartile range

Return im:

image at its original scale

micro_dl.utils.normalize.zscore(input_image, im_mean=None, im_std=None)

Performs z-score normalization. Adds epsilon in denominator for robustness

Parameters:
  • input_image (np.array) – input image for intensity normalization

  • im_mean (float/None) – Image mean

  • im_std (float/None) – Image std

Return np.array norm_img:

z score normalized image

micro_dl.utils.preprocess_utils module

micro_dl.utils.preprocess_utils.get_preprocess_config(data_dir)
micro_dl.utils.preprocess_utils.validate_mask_meta(mask_dir, input_dir, csv_name=None, mask_channel=None)

If user provides existing masks, the mask directory should also contain a csv file (not named frames_meta.csv which is reserved for output) with two column names: mask_name and file_name. Each row should describe the mask name and the corresponding file name. Each file_name should exist in input_dir and belong to the same channel. This function checks that all file names exist in input_dir and writes a frames_meta csv containing mask names with indices corresponding to the matched file_name. It also assigns a mask channel number for future preprocessing steps like tiling.

Parameters:
  • mask_dir (str) – Mask directory

  • input_dir (str) – Input image directory, to match masks with images

  • mask_channel (int/None) – Channel idx assigned to masks

Return int mask_channel:

New channel index for masks for writing tiles

Raises:

IOError: If no csv file is present in mask_dir

Raises:

IOError: If more than one csv file exists in mask_dir and no csv_name is provided to resolve ambiguity

Raises:

AssertionError: If csv doesn’t consist of two columns named ‘mask_name’ and ‘file_name’

Raises:

IndexError: If unable to match file_name in mask_dir csv with file_name in input_dir for any given mask row

micro_dl.utils.tile_utils module

micro_dl.utils.tile_utils.crop_at_indices(input_image, crop_indices, save_dict=None, tile_3d=False)

Crop image into tiles at given indices.

Parameters:
  • input_image (np.array) – input image for cropping

  • crop_indices (list/tuple) – list of indices for cropping

  • save_dict (dict/None) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname

  • tile_3d (bool) – boolean flag for adding slice_start_idx to meta

Returns:

if not saving tiles: a list with tuples of cropped image id of the format rrmin-rmax_ccmin-cmax_slslmin-slmax and cropped image. Else saves tiles in-place and returns a df with tile metadata

micro_dl.utils.tile_utils.tile_image(input_image, tile_size, step_size, return_index=False, min_fraction=None, save_dict=None)

Tiles the image based on given tile and step size. USE MIN_FRACTION WITH INPUT_IMAGE.DTYPE=bool / MASKS

Parameters:
  • input_image (np.array) – 3D input image to be tiled

  • tile_size (list/tuple/np array) – size of the blocks to be tiled from the image

  • step_size (list/tuple/np array) – size of the window shift. In case of no overlap, the step size is tile_size. If overlap, step_size < tile_size

  • return_index (bool) – indicator for returning tile indices

  • min_fraction (float) – Minimum fraction of foreground in mask for including tile

  • save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname

Returns:

if not saving: a list with tuples of tiled image id of the format rrmin-rmax_ccmin-cmax_slslmin-slmax and tiled image Else: save tiles in-place and return a df with tile metadata if return_index=True: return a list with tuples of crop indices

micro_dl.utils.tile_utils.write_meta(tiled_metadata, save_dict)

Write meta for tiles from an image as a csv

Parameters:
  • tiled_metadata (list) – list of meta dicts

  • save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname

Returns:

micro_dl.utils.tile_utils.write_tile(tile, file_name, save_dict)

Write tile function that can be called using threading.

Parameters:
  • tile (np.array) – one tile

  • file_name (str) – File name for tile (must be .npy format)

  • save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx,

Return str op_fname:

filename used for saving the tile with entire path

micro_dl.utils.train_utils module

Utility functions used for training

micro_dl.utils.train_utils.check_gpu_availability(gpu_id)

Check if mem_frac is available in given gpu_id

Parameters:
  • gpu_id (int/list) – id of the gpu to be used. Int for single GPU training, list for distributed training

  • gpu_mem_frac (list) – mem fraction for each GPU in gpu_id

Return bool gpu_availability:

True if all mem_fracs are greater than gpu_mem_frac

Return list curr_mem_frac:

list of current memory fractions available for gpus in to gpu_id

micro_dl.utils.train_utils.get_loss(loss_str)

Get loss type from config

micro_dl.utils.train_utils.get_metrics(metrics_list)

Get the metrics from config

micro_dl.utils.train_utils.select_gpu(gpu_ids=None, gpu_mem_frac=None)

Find the GPU ID with highest available memory fraction. If ID is given as input, set the gpu_mem_frac to maximum available, or if a memory fraction is given, make sure the given GPU has the desired memory fraction available. Currently only supports single GPU runs.

Parameters:
  • gpu_ids (int) – Desired GPU ID. If None, find GPU with the most memory available.

  • gpu_mem_frac (float) – Desired GPU memory fraction [0, 1]. If None, use maximum available amount of GPU.

Return int gpu_ids:

GPU ID to use.

Return float cur_mem_frac:

GPU memory fraction to use

Raises:

NotImplementedError: If gpu_ids is not int

Raises:

AssertionError: If requested memory fraction isn’t available

micro_dl.utils.train_utils.set_keras_session(gpu_ids, gpu_mem_frac)

Set the Keras session

Module contents

Module for utility functions