micro_dl.utils module

Submodules

micro_dl.utils.aux_utils module

micro_dl.utils.aux_utils.adjust_slice_margins(slice_ids, depth)

Adjusts slice indices to given z depth by removing indices too close to boundaries. Assumes that slice indices are contiguous.

Parameters:

slice_ids (list of ints) – Slice (z) indices
depth (int) – Number of z slices

Return list slice_ids:

Slice indices with adjusted margins

Raises:

AssertionError: if depth is even

Raises:

AssertionError: if there aren’t enough slice ids for given depth

Raises:

AssertionError: if slices aren’t contiguous

micro_dl.utils.aux_utils.convert_channel_names_to_ids(channel_map, channel_list)

Assuming you have a dict from get_channels and a list of channel names, you get a list of channel indices.

Parameters:

channel_map (dict) – Channel names with indices
channel_list (list) – List of channel names (subset of channel_map) if containing ints, return as is.

Return list channel_ids:

List of (int) channel indices

Raises:

AssertionError – if any channel in list is not in channel_map

micro_dl.utils.aux_utils.get_channel_axis(data_format)

Get the channel axis given the data format

Parameters:: data_format (str) – as named. [channels_last, channel_first]

:return int channel_axis

micro_dl.utils.aux_utils.get_channels(frames_meta)

Load frames metadata from directory, find channel names and their corresponding indices.

Parameters:: frames_meta (pd.DataFrame) – Metadata for frames
Return dict channel_map:: Channel name and corresponding index
Raises:: AssertionError – if channel name column is incompletely populated

micro_dl.utils.aux_utils.get_im_name(time_idx=None, channel_idx=None, slice_idx=None, pos_idx=None, extra_field=None, ext='.png', int2str_len=3)

Create an image name given parameters and extension

Parameters:

time_idx (int) – Time index
channel_idx (int) – Channel index
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
extra_field (str) – Any extra string you want to include in the name
ext (str) – Extension, e.g. ‘.png’ or ‘.npy’
int2str_len (int) – Length of string of the converted integers

Return st im_name:

Image file name

micro_dl.utils.aux_utils.get_meta_idx(frames_metadata, time_idx, channel_idx, slice_idx, pos_idx)

Get row index in metadata dataframe given variable indices

Parameters:

frames_metadata (dataframe) – Dataframe with column names given below
time_idx (int) – Timepoint index
channel_idx (int) – Channel index
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index

Returns:

int pos_idx: Row position matching indices above

micro_dl.utils.aux_utils.get_row_idx(frames_metadata, time_idx, channel_idx, slice_idx=-1, pos_idx=-1, dir_names=None)

Get the indices for images with timepoint_idx and channel_idx

Parameters:

frames_metadata (pd.DataFrame) – DF with columns time_idx, channel_idx, slice_idx, file_name]
time_idx (int) – get info for this timepoint
channel_idx (int) – get info for this channel
slice_idx (int) – get info for this focal plane (2D)
pos_idx (int) – Specify FOV (default to all if -1)
dir_names (str) – Directory names if not in dataframe?

Return row_idx:

Row index in dataframe

micro_dl.utils.aux_utils.get_sms_im_name(time_idx=None, channel_name=nan, slice_idx=None, pos_idx=None, extra_field=None, ext='.tiff', int2str_len=3)

Create an image name given parameters and extension This function is custom for the computational microscopy (SMS) group, who has the following file naming convention: File naming convention is assumed to be: img_channelname_t***_p***_z***_extrafield.tif This function will alter list and dict in place.

Parameters:

time_idx (int) – Time index
channel_name (str/NaN) – Channel name
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
extra_field (str) – Any extra string you want to include in the name
ext (str) – Extension starting with period, default ‘.tiff’
int2str_len (int) – Length of string of the converted integers

Return str im_name:

Image file name

micro_dl.utils.aux_utils.get_sorted_names(dir_name)

Get image names in directory and sort them by their indices

Parameters:: dir_name (str) – Image directory name
Return list of strs im_names:: Image names sorted according to indices

micro_dl.utils.aux_utils.get_sub_meta(frames_metadata, time_ids, channel_ids, slice_ids, pos_ids)

Get sliced metadata dataframe given variable indices

Parameters:

frames_metadata (dataframe) – Dataframe with column names given below
time_ids (int/list) – Timepoint indices
channel_ids (int/list) – Channel indices
slice_ids (int/list) – Slize (z) indices
pos_ids (int/list) – Position (FOV) indices

Returns:

int pos_ids: Row positions matching indices above

micro_dl.utils.aux_utils.import_object(module_name, obj_name, obj_type='class')

Imports a class or function dynamically

Parameters:

module_name (str) – modules such as input, utils, train etc
obj_name (str) – Object to find
obj_type (str) – Object type (class or function)

micro_dl.utils.aux_utils.init_logger(logger_name, log_fname, log_level)

Creates a logger instance

Parameters:

logger_name (str) – name of the logger instance
log_fname (str) – fname with full path of the log file
log_level (int) – specifies the logging level: NOTSET:0, DEBUG:10,

INFO:20, WARNING:30, ERROR:40, CRITICAL:50

micro_dl.utils.aux_utils.make_dataframe(nbr_rows=None, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'])

Create empty frames metadata pandas dataframe given number of rows and standard column names defined below

Parameters:

nbr_rows ([None, int]) – The number of rows in the dataframe
df_names (list) – Dataframe column names

Return dataframe frames_meta:

Empty dataframe with given indices and column names

micro_dl.utils.aux_utils.parse_idx_from_name(im_name, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'], dir_name=None, order='cztp'): Assumes im_name is e.g. im_c***_z***_p***_t***.png, It doesn’t care about the extension or the number of digits each index is represented by, it extracts all integers from the image file name and assigns them by order. By default it assumes that the order is c, z, t, p. :param str im_name: Image name without path :param list of strs df_names: Dataframe col names :param str dir_name: Directory path :param str order: Order in which c, z, t, p are given in the image (4 chars) :return dict meta_row: One row of metadata given image file name

micro_dl.utils.aux_utils.parse_sms_name(im_name, df_names=['channel_idx', 'pos_idx', 'slice_idx', 'time_idx', 'channel_name', 'dir_name', 'file_name'], dir_name=None, channel_names=[])

Parse metadata from file name or file path. This function is custom for the computational microscopy (SMS) group, who has the following file naming convention: File naming convention is assumed to be: img_channelname_t***_p***_z***.tif This function will alter list and dict in place.

Parameters:

im_name (str) – File name or path
df_names (list of strs) – Dataframe col names
dir_name (str) – Directory path
channel_names (list[str]) – Expanding list of channel names

Return dict meta_row:

One row of metadata given image file name

micro_dl.utils.aux_utils.read_config(config_fname)

Read the config file in yml format. TODO: validate config!

Parameters:: config_fname (str) – fname of config yaml with its full path
Returns:: dict config: Configuration parameters

micro_dl.utils.aux_utils.read_json(json_filename)

Read JSON file and validate schema

Parameters:: json_filename (str) – json file name
Returns:: dict json_object: JSON object
Raises:: FileNotFoundError: if file can’t be read
Raises:: JSONDecodeError: if file is not in json format

micro_dl.utils.aux_utils.read_meta(input_dir, meta_fname='frames_meta.csv')

Read metadata file, which is assumed to be named ‘frames_meta.csv’ in given directory.

Parameters:

input_dir (str) – Directory containing data and metadata
meta_fname (str) – Metadata file name

Return dataframe frames_metadata:

Metadata for all frames

Raises:

IOError: If metadata file isn’t present

micro_dl.utils.aux_utils.save_tile_meta(tiles_meta, cur_channel, tiled_dir)

Save meta data for tiled images

Parameters:

tiles_meta (list) – List of tuples holding meta info for tiled images
cur_channel (int) – Channel being tiled
tiled_dir (str) – Directory to save meta data in

micro_dl.utils.aux_utils.sort_meta_by_channel(frames_metadata)

Rearrange metadata dataframe from all channels being listed in the same column to moving file names for each channel to separate columns.

Parameters:: frames_metadata (dataframe) – Metadata with one column named ‘file_name’
Return dataframe sorted_metadata:: Metadata with separate file_name_X for channel X.

micro_dl.utils.aux_utils.validate_config(config_dict, params)

Check if the required params are present in config

Parameters:

config_dict (dict) – dictionary with params as keys
params (list) – list of strings with expected params

Returns:

list with bool values indicating if param is present or not

micro_dl.utils.aux_utils.validate_indices(frames_meta, preprocess_config, idx_type)

Helper function to check if a list of position, time or slice indices in the preprocessing config exist in the frames metadata. If not, use all indices in metadata.

Parameters:

frames_meta (pd.DataFrame) – Metadata for all images
preprocess_config (dict) – Preprocessing config
idx_type (str) – Type of index: ‘pos’, ‘time’, ‘slice’

Return list use_ids:

Indices to be used in preprocessing

Raises:

AssertionError – If indices in preprocess config is not a subset of those found in frames metadata

micro_dl.utils.aux_utils.validate_metadata_indices(frames_metadata, time_ids=None, channel_ids=None, slice_ids=None, pos_ids=None, uniform_structure=True)

Check the availability of indices provided timepoints, channels, positions and slices for all data. If input ids are None, the indices for that parameter will not be evaluated. If input ids are -1, all indices for that parameter will be returned.

Parameters:

frames_metadata (pd.DataFrame) – DF with columns time_idx, channel_idx, slice_idx, pos_idx, file_name]
time_ids (int/list) – check availability of these timepoints in frames_metadata
channel_ids (int/list) – check availability of these channels in frames_metadata
pos_ids (int/list) – Check availability of positions in metadata
slice_ids (int/list) – Check availability of z slices in metadata
uniform_structure (bool) – bool indicator if unequal quantities in any of the ids (channel, time, slice, pos)

Return dict metadata_ids:

All indices found given input

Raises:

AssertionError: If not all channels, timepoints, positions or slices are present

micro_dl.utils.aux_utils.write_json(json_dict, json_filename)

Writes dict as json file.

Parameters:

json_dict (dict) – Dictionary to be written
json_filename (str) – Full path file name of json

micro_dl.utils.image_utils module

Utility functions for processing images

micro_dl.utils.image_utils.apply_flat_field_correction(input_image, **kwargs)

Apply flat field correction.

Parameters:

input_image (np.array) – image to be corrected
**kwargs – See below

Returns:

np.array (float) corrected image

Keyword arguments:

flat_field_image (np.float) – flat_field_image for correction OR
flat_field_path (str) – Full path to flatfield image

micro_dl.utils.image_utils.center_crop_to_shape(input_image, output_shape, image_format='zyx')

Center crop the image to a given shape

Parameters:

input_image (np.array) – input image to be cropped
output_shape (list) – desired crop shape
image_format (str) – Image format; zyx or xyz

Return np.array center_block:

Center of input image with output shape

micro_dl.utils.image_utils.crop2base(im, base=2)

Crop image to nearest smaller factor of the base (usually 2), assumes xyz format, will work for zyx too but the x_shape, y_shape and z_shape will be z_shape, y_shape and x_shape respectively

Parameters:

im (nd.array) – Image
base (int) – Base to use, typically 2
crop_z (bool) – crop along z dim, only for UNet3D

Return nd.array im:

Cropped image

Raises:

AssertionError: if base is less than zero

micro_dl.utils.image_utils.fit_polynomial_surface_2D(sample_coords, sample_values, im_shape, order=2, normalize=True)

Given coordinates and corresponding values, this function will fit a 2D polynomial of given order, then create a surface of given shape.

Parameters:

sample_coords (np.array) – 2D sample coords (nbr of points, 2)
sample_values (np.array) – Corresponding intensity values (nbr points,)
im_shape (tuple) – Shape of desired output surface (height, width)
order (int) – Order of polynomial (default 2)
normalize (bool) – Normalize surface by dividing by its mean for flatfield correction (default True)

Return np.array poly_surface:

2D surface of shape im_shape

micro_dl.utils.image_utils.get_flat_field_path(flat_field_dir, channel_idx, channel_ids)

Given channel and flatfield dir, check that corresponding flatfield is present and returns its path.

Parameters:

flat_field_dir (str) – Flatfield directory
channel_idx (int) – Channel index for flatfield
channel_ids (list) – All channel indices being processed

micro_dl.utils.image_utils.grid_sample_pixel_values(im, grid_spacing)

Sample pixel values in the input image at the grid. Any incomplete grids (remainders of modulus operation) will be ignored.

Parameters:

im (np.array) – 2D image
grid_spacing (int) – spacing of the grid

Return int row_ids:

row indices of the grids

Return int col_ids:

column indices of the grids

Return np.array sample_values:

sampled pixel values

micro_dl.utils.image_utils.im_adjust(img, tol=1, bit=8): Adjust contrast of the image

micro_dl.utils.image_utils.im_bit_convert(im, bit=16, norm=False, limit=[])

micro_dl.utils.image_utils.preprocess_image(im, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Do histogram clipping, z score normalization, and potentially binarization.

Parameters:

im (np.array) – Image (stack)
hist_clip_limits (tuple) – Percentile histogram clipping limits
is_mask (bool) – True if mask
normalize_im (str/None) – Normalization, if any
zscore_mean (float/None) – Data mean
zscore_std (float/None) – Data std

micro_dl.utils.image_utils.preprocess_imstack(frames_metadata, depth, time_idx, channel_idx, slice_idx, pos_idx, dir_name=None, flat_field_path=None, hist_clip_limits=None, normalize_im='stack')

Preprocess image given by indices: flatfield correction, histogram clipping and z-score normalization is performed.

Parameters:

frames_metadata (pd.DataFrame) – DF with meta info for all images
depth (int) – num of slices in stack if 2.5D or depth for 3D
time_idx (int) – Time index
channel_idx (int) – Channel index
slice_idx (int) – Slice (z) index
pos_idx (int) – Position (FOV) index
dir_name (str/None) – Image directory (none if using the frames_meta dir_name)
flat_field_path (np.array) – Path to flat field image for channel
hist_clip_limits (list) – Limits for histogram clipping (size 2)
normalize_im (str or None) – options to z-score the image

Return np.array im:

3D preprocessed image

micro_dl.utils.image_utils.read_image(file_path)

Read 2D grayscale image from file. Checks file extension for npy and load array if true. Otherwise reads regular image using OpenCV (png, tif, jpg, see OpenCV for supported files) of any bit depth.

Parameters:: file_path (str) – Full path to image
Return array im:: 2D image
Raises:: IOError: if image can’t be opened

micro_dl.utils.image_utils.read_image_from_row(meta_row, dir_name=None)

Read 2D grayscale image from file. Checks file extension for npy and load array if true. Otherwise reads regular image using OpenCV (png, tif, jpg, see OpenCV for supported files) of any bit depth.

Parameters:

meta_row (pd.DataFrame) – Row in metadata
dir_name (str/None) – Directory containing images (none if using frames meta dir_name)

Return array im:

2D image

:raise IOError if image can’t be opened

micro_dl.utils.image_utils.read_imstack(input_fnames, flat_field_fnames=None, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Read the images in the fnames and assembles a stack. If images are masks, make sure they’re boolean by setting >0 to True

Parameters:

input_fnames (tuple/list) – Paths to input files
flat_field_fnames (str/list) – Path(s) to flat field image(s)
hist_clip_limits (tuple) – limits for histogram clipping
is_mask (bool) – Indicator for if files contain masks
normalize_im (bool/None) – Whether to zscore normalize im stack
zscore_mean (float) – mean for z-scoring the image
zscore_std (float) – std for z-scoring the image

Return np.array:

input stack flat_field correct and z-scored if regular images, booleans if they’re masks

micro_dl.utils.image_utils.read_imstack_from_meta(frames_meta_sub, dir_name=None, flat_field_fnames=None, hist_clip_limits=None, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Read images (>1) from metadata rows and assembles a stack. If images are masks, make sure they’re boolean by setting >0 to True

Parameters:

frames_meta_sub (pd.DataFrame) – Selected subvolume to be read
dir_name (str/None) – Directory path (none if using dir in frames_meta)
flat_field_fnames (str/list) – Path(s) to flat field image(s)
hist_clip_limits (tuple) – Percentile limits for histogram clipping
is_mask (bool) – Indicator for if files contain masks
normalize_im (bool/None) – Whether to zscore normalize im stack
zscore_mean (float) – mean for z-scoring the image
zscore_std (float) – std for z-scoring the image

Return np.array:

input stack flat_field correct and z-scored if regular images, booleans if they’re masks

micro_dl.utils.image_utils.rescale_image(im, scale_factor)

Rescales a 2D image equally in x and y given a scale factor. Uses bilinear interpolation (the OpenCV default).

Parameters:

im (np.array) – 2D image
scale_factor (float) –

Return np.array:

2D image resized by scale factor

micro_dl.utils.image_utils.rescale_nd_image(input_volume, scale_factor)

Rescale a nd array, mainly used for 3D volume

For non-int dims, the values are rounded off to closest int. 0.5 is iffy, when downsampling the value gets floored and upsampling it gets rounded to next int

Parameters:

input_volume (np.array) – 3D stack
scale_factor (float/list) – if scale_factor is a float, scale all dimensions by this. Else scale_factor has to be specified for each dimension in a list or tuple

Return np.array res_volume:

rescaled volume

micro_dl.utils.image_utils.resize_image(input_image, output_shape)

Resize image to a specified shape

Parameters:

input_image (np.ndarray) – image to be resized
output_shape (tuple/np.array) – desired shape of the output image

Returns:

np.array, resized image

micro_dl.utils.image_utils.resize_mask(input_image, target_size): Resample label/bool images

micro_dl.utils.io_utils module

class micro_dl.utils.io_utils.DefaultZarr(store, root_path)

Bases: WriterBase

This writer is based off creating a default HCS hierarchy for non-hcs datasets. Currently, we decide that all positions will live under individual columns under a single row. i.e. this produces the following structure: Dataset.zarr

____> Row_0

—> Col_0
—> Pos_000

… –> Col_N

—> Pos_N

We assume this structure in the metadata updating/position creation

create_position(position, name)

Creates a column and position subgroup given the index and name. Name is provided by the main writer class

Parameters:

position (int) – Index of the position to create
name (str) – Name of the position subgroup

init_hierarchy(): method to init the default hierarchy. Will create the first row and initialize metadata fields

class micro_dl.utils.io_utils.ReaderBase

Bases: object

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

This will be updated if the io parts of waveorder is moved to a stand alone python package.

get_array(position: int) → ndarray

get_image(p, t, c, z) → ndarray

get_num_positions() → int

get_zarr(position: int) → array

property shape

class micro_dl.utils.io_utils.WriterBase(store, root_path)

Bases: object

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

This will be updated if the io of waveorder is moved to a stand alone python package. ABC for all writer types

create_channel_dict(chan_name, clim=None, first_chan=False): This will create a dictionary used for OME-zarr metadata. Allows custom contrast limits and channel names for display. Defaults everything to grayscale. Parameters ———- chan_name: (str) Desired name of the channel for display clim: (tuple) contrast limits (start, end, min, max) first_chan: (bool) whether or not this is the first channel of the dataset (display will be set to active) Returns ——- dict_: (dict) dictionary adherent to ome-zarr standards

create_column(row_idx, idx, name=None): Creates a column in the hierarchy (second level below zarr store, one below row). Option to name this column. Default is Col_{idx}. Keeps track of the column name + column index for later metadata creation Parameters ———- row_idx: (int) Index of the row to place the column underneath idx: (int) Index of the column (order in which it is placed) name: (str) Optional name to replace default column name Returns ——-

create_position(position: int, name: str)

create_row(idx, name=None): Creates a row in the hierarchy (first level below zarr store). Option to name this row. Default is Row_{idx}. Keeps track of the row name + row index for later metadata creation Parameters ———- idx: (int) Index of the row (order in which it is placed) name: (str) Optional name to replace default row name Returns ——-

get_zarr()

init_array(data_shape, chunk_size, dtype, chan_names, clims, overwrite=False)

Initializes the zarr array under the current position subgroup. array level is called ‘arr_0’ in the hierarchy. Sets omero/multiscales metadata based upon chan_names and clims Parameters ———- data_shape: (tuple) Desired Shape of your data (T, C, Z, Y, X). Must match data chunk_size: (tuple) Desired Chunk Size (T, C, Z, Y, X). Chunking each image would be (1, 1, 1, Y, X) dtype: (str or np.dtype) Data Type, i.e. ‘uint16’ or np.uint16 chan_names: (list) List of strings corresponding to your channel names. Used for OME-zarr metadata clims: (list) list of tuples corresponding to contrast limtis for channel. OME-Zarr metadata

tuple can be of (start, end, min, max) or (start, end)

overwrite: (bool) Whether or not to overwrite the existing data that may be present. Returns ——-

init_hierarchy()

open_position(position: int): Opens a position based upon the position index. It will navigate the rows/column to find where this position is based off of the generation position map which keeps track of this information. It will set current_pos_group to this position for writing the data Parameters ———- position: (int) Index of the position you wish to open Returns ——-

set_channel_attributes(chan_names, clims=None)

A method for creating ome-zarr metadata dictionary. Channel names are defined by the user, everything else is pre-defined. Parameters ———- chan_names: (list) List of channel names in the order of the channel dimensions

i.e. if 3D Phase is C = 0, list ‘3DPhase’ first.

clims: (list of tuples) contrast limits to display for every channel

set_root(root): set the root path of the zarr store. Used in the main writer class. Parameters ———- root: (str) path to the zarr store (folder ending in .zarr) Returns ——-

set_store(store): Sets the zarr store. Used in the main writer class Parameters ———- store: (Zarr StoreObject) Opened zarr store at the highest level Returns ——-

set_verbosity(verbose: bool)

write(data, t, c, z): Write data to specified index of initialized zarr array :param data: (nd-array), data to be saved. Must be the shape that matches indices (T, C, Z, Y, X) :param t: (list), index or index slice of the time dimension :param c: (list), index or index slice of the channel dimension :param z: (list), index or index slice of the z dimension

class micro_dl.utils.io_utils.ZarrReader(zarrfile: str)

Bases: ReaderBase

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

Reader for HCS ome-zarr arrays. OME-zarr structure can be found here: https://ngff.openmicroscopy.org/0.1/ Also collects the HCS metadata so it can be later copied.

get_array(position)

Gets the (T, C, Z, Y, X) array at given position

Parameters:: position (int) – Position index
Return np.array pos:: Array of size (T, C, Z, Y, X) at specified position

get_image(p, t, c, z)

Returns the image at dimension P, T, C, Z

Parameters:

p (int) – Index of the position dimension
t (int) – Index of the time dimension
c (int) – Index of the channel dimension
z (int) – Index of the z dimension

Return np.array image:

Image at the given dimension of shape (Y, X)

get_image_plane_metadata(p, c, z)

For the sake of not keeping an enormous amount of metadata, only the microscope conditions for the first timepoint are kept in the zarr metadata during write. User can only query image

plane metadata at p, c, z

Parameters:

p (int) – Position index
c (int) – Channel index
z (int) – Z-slice index

Return dict metadata:

Image Plane Metadata at given coordinate w/ T = 0

get_num_positions() → int

get_zarr(position)

Returns the position-level zarr group array (not in memory)

Parameters:: position (int) – Position index

:return ZarrArray Zarr array containing the (T, C, Z, Y, X) array at given position

class micro_dl.utils.io_utils.ZarrWriter(save_dir: Optional[str] = None, hcs_meta: Optional[dict] = None, verbose: bool = False)

Bases: object

I/O classes for zarr data are directly copied from: https://github.com/mehta-lab/waveorder/tree/master/waveorder/io

given stokes or physical data, construct a standard hierarchy in zarr for output: should conform to the ome-zarr standard as much as possible

TODO: Allow for writing multiple positions in same store

create_zarr_root(name)

Method for creating the root zarr store. If the store already exists, it will raise an error. Name corresponds to the root directory name (highest level) zarr store.

Parameters:: name (str) – Name of the zarr store.

current_group_name = None

current_position = None

init_array(position, data_shape, chunk_size, chan_names, dtype='float32', clims=None, position_name=None, overwrite=False)

Creates a subgroup structure based on position index. Then initializes the zarr array under the current position subgroup. Array level is called ‘array’ in the hierarchy.

Parameters:

position (int) – Position index upon which to initialize array
data_shape (tuple) – Desired Shape of your data (T, C, Z, Y, X). Must match data
chunk_size (tuple) – Desired Chunk Size (T, C, Z, Y, X). Chunking each image would be (1, 1, 1, Y, X)
dtype (str) – Data Type, i.e. ‘uint16’
clims (list) – List of tuples corresponding to contrast limtis for channel. OME-Zarr metadata
overwrite (bool) – Whether or not to overwrite the existing data that may be present.

Parm list chan_names:

List of strings corresponding to your channel names. Used for OME-zarr metadata

store = None

write(data, p, t=None, c=None, z=None)

Wrapper that calls the builder’s write function. Will write to existing array of zeros and place data over the specified indicies.

Parameters:

data (np.array) – Data to be saved. Must be the shape that matches indices (T, C, Z, Y, X)
p (int) – Position index in which to write the data into
t (int/slice) – Time index or index range of the time dimension
c (int/slice) – Channel index or index range of the channel dimension
z (int/slice) – Slice index or index range of the Z-slice dimension

micro_dl.utils.masks module

micro_dl.utils.masks.create_otsu_mask(input_image, str_elem_size=3, thr=None, kernel_size=3, w_shed=False)

Create a binary mask using morphological operations Opening removes small objects in the foreground.

Parameters:

input_image (np.array) – generate masks from this image
str_elem_size (int) – size of the structuring element. typically 3, 5
thr (float) – Threshold
kernel_size (int) – Kernel size
w_shed (bool) – Whether to use watershed

Returns:

mask of input_image, np.array

micro_dl.utils.masks.create_unimodal_mask(input_image, str_elem_size=3, kernel_size=3)

Create a mask with unimodal thresholding and morphological operations. Unimodal thresholding seems to oversegment, erode it by a fraction

Parameters:

input_image (np.array) – generate masks from this image
str_elem_size (int) – size of the structuring element. typically 3, 5

:return mask of input_image, np.array

micro_dl.utils.masks.get_unet_border_weight_map(annotation, w0=10, sigma=5)

Return weight map for borders as specified in UNet paper. Note: The below method only works for UNet Segmentation only. TODO: Calculate boundaries directly and calculate distance from boundary of cells to another.

:param annotation A 2D array of shape (image_height, image_width) contains annotation with each class labeled as an integer. :param w0 multiplier to the exponential distance loss default 10 as mentioned in UNet paper :param sigma standard deviation in the exponential distance term e^(-d1 + d2) ** 2 / 2 (sigma ^ 2) default 5 as mentioned in UNet paper :return weight map for borders as specified in UNet

micro_dl.utils.masks.get_unimodal_threshold(input_image)

Determines optimal unimodal threshold

https://users.cs.cf.ac.uk/Paul.Rosin/resources/papers/unimodal2.pdf https://www.mathworks.com/matlabcentral/fileexchange/45443-rosin-thresholding

Parameters:: input_image (np.array) – generate mask for this image
Return float best_threshold:: optimal lower threshold for the foreground hist

micro_dl.utils.meta_utils module

micro_dl.utils.meta_utils.compute_zscore_params(frames_meta, ints_meta, input_dir, normalize_im, min_fraction=0.99)

Compute median and interquartile range of intensities in blocks/tiles determined ints_meta_generator function (saved in intensity_meta.csv). Masks need to bee computed and only tiles with enough foreground given masks (determined by min_fraction) will be included in the analysis.

Parameters:

frames_meta (pd.DataFrame) – Dataframe containing all metadata
ints_meta (pd.DataFrame) – Metadata containing intensity statistics each z-slice and foreground fraction for masks
input_dir (str) – Directory containing images
normalize_im (None/str) – normalization scheme for input images
min_fraction (float) – Minimum foreground fraction of masks for computing intensity statistics.

Return pd.DataFrame frames_meta:

DataFrame containing all metadata

Return pd.DataFrame ints_meta:

Metadata containing intensity statistics each z-slice

micro_dl.utils.meta_utils.frames_meta_from_filenames(input_dir, name_parser)

Extracts metadata (channel, position, time, slice) from file name.

Parameters:

input_dir (str) – path to input directory containing images
name_parser (str) – Function in aux_utils for parsing indices from file name

Return pd.DataFrame frames_meta:

Metadata for all frames in dataset

micro_dl.utils.meta_utils.frames_meta_from_zarr(input_dir, file_names)

Reads ome-zarr file and creates frames_meta based on metadata and array information. Assumes one zarr store per position according to OME guidelines.

Parameters:

input_dir (str) – Input directory
file_names (list) – List of full paths to all zarr files in dir

Return pd.DataFrame frames_meta:

Metadata for all frames in zarr

micro_dl.utils.meta_utils.frames_meta_generator(input_dir, file_format='zarr', name_parser='parse_sms_name')

Generate metadata from file names, or metadata in the case of zarr files, for preprocessing. Will write found data in frames_metadata.csv in input directory.

Naming convention for default parser ‘parse_sms_name’: img_channelname_t***_p***_z***.tif for parse_sms_name

The file structure for ome-zarr files is described here: https://ngff.openmicroscopy.org/0.1/

Parameters:

input_dir (str) – path to input directory containing image data
file_format (str) – Image file format (‘zarr’ or ‘tiff’ or ‘png’)
name_parser (str) – Function in aux_utils for parsing indices from tiff/png file name

Return pd.DataFrame frames_meta:

Metadata for all frames in dataset

micro_dl.utils.meta_utils.ints_meta_generator(input_dir, channel_ids, num_workers=4, block_size=256, flat_field_dir=None)

Generate pixel intensity metadata for estimating image normalization parameters during preprocessing step. Pixels are sub-sampled from the image following a grid pattern defined by block_size to for efficient estimation of median and interquartile range. Grid sampling is preferred over random sampling in the case due to the spatial correlation in images. Will write found data in ints_meta.csv in input directory. Assumed default naming convention for tiff files is: img_channelname_t***_p***_z***.tif for parse_sms_name

Parameters:

input_dir (str) – path to input directory containing images
channel_ids (list) – Channel indices to process
num_workers (int) – number of workers for multiprocessing
block_size (int) – block size for the grid sampling pattern. Default value works well for 2048 X 2048 images.
flat_field_dir (str) – Directory containing flatfield images

micro_dl.utils.meta_utils.mask_meta_generator(input_dir, num_workers=4)

Generate pixel intensity metadata for estimating image normalization parameters during preprocessing step. Pixels are sub-sampled from the image following a grid pattern defined by block_size to for efficient estimation of median and interquatile range. Grid sampling is preferred over random sampling in the case due to the spatial correlation in images. Will write found data in intensity_meta.csv in input directory. Assumed default file naming convention is:

img_channelname_t***_p***_z***.tif for parse_sms_name

Parameters:

input_dir (str) – path to input directory containing images
order (str) – Order in which file name encodes cztp
name_parser (str) – Function in aux_utils for parsing indices from file name
num_workers (int) – number of workers for multiprocessing

Return pd.DataFrame mask_meta:

Metadata with mask info

micro_dl.utils.mp_utils module

micro_dl.utils.mp_utils.create_save_mask(channels_meta_sub, flat_field_fnames, str_elem_radius, mask_dir, mask_channel_idx, int2str_len, mask_type, mask_ext, dir_name=None, channel_thrs=None)

Create and save mask. When more than one channel are used to generate the mask, mask of each channel is generated then added together.

Parameters:

channels_meta_sub (pd.DataFrame) – Metadata for given PTCZ
flat_field_fnames (list/None) – Paths to corresponding flat field images
str_elem_radius (int) – size of structuring element used for binary opening. str_elem: disk or ball
mask_dir (str) – dir to save masks
mask_channel_idx (int) – channel number of mask
time_idx (int) – time points to use for generating mask
pos_idx (int) – generate masks for given position / sample ids
slice_idx (int) – generate masks for given slice ids
int2str_len (int) – Length of str when converting ints
mask_type (str) – thresholding type used for masking or str to map to masking function
mask_ext (str) – ‘.npy’ or ‘.png’. Save the mask as uint8 PNG or NPY files for otsu, unimodal masks, recommended to save as npy float64 for borders_weight_loss_map masks to avoid loss due to scaling it to uint8.
dir_name (str/None) – Image directory (none if using frames_meta dir_name)
channel_thrs (list) – list of threshold for each channel to generate binary masks. Only used when mask_type is ‘dataset_otsu’

Return dict cur_meta:

For each mask, fg_frac is added to metadata

micro_dl.utils.mp_utils.crop_at_indices_save(meta_sub, flat_field_fname, hist_clip_limits, slice_idx, crop_indices, image_format, save_dir, dir_name=None, int2str_len=3, is_mask=False, tile_3d=False, normalize_im=True, zscore_mean=None, zscore_std=None)

Crop image into tiles at given indices and save.

Parameters:

meta_sub (pd.DataFrame) – Subset of metadata for images to be cropped
flat_field_fname (str) – File nname of flat field image
hist_clip_limits (tuple) – limits for histogram clipping
time_idx (int) – time point of input image
channel_idx (int) – channel idx of input image
slice_idx (int) – slice idx of input image
pos_idx (int) – sample idx of input image
crop_indices (tuple) – tuple of indices for cropping
image_format (str) – zyx or xyz
save_dir (str) – output dir to save tiles
dir_name (str/None) – Input directory
int2str_len (int) – len of indices for creating file names
is_mask (bool) – Indicates if files are masks
tile_3d (bool) – indicator for tiling in 3D

Returns:

pd.DataFrame from a list of dicts with metadata

micro_dl.utils.mp_utils.get_im_stats(im_path)

Read and computes statistics of images.

Parameters:: im_path (str) – Full path to image
Return dict meta_row:: Dict with intensity data for image

micro_dl.utils.mp_utils.get_mask_meta_row(file_path, meta_row)

Given path to mask, read mask, compute foreground fraction and fill in corresponding metadata row.

Parameters:

file_path (str) – Path to binary mask image
meta_row (pd.DataFrame) – Metadata row to fill in

Return pd.DataFrame meta_row:

Metadata row with foreground fraction for mask

micro_dl.utils.mp_utils.mp_create_save_mask(fn_args, workers)

Create and save masks with multiprocessing

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

Returns:

list of returned dicts from create_save_mask

micro_dl.utils.mp_utils.mp_crop_save(fn_args, workers)

Crop and save images with multiprocessing.

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

Returns:

list of returned df from crop_at_indices_save

micro_dl.utils.mp_utils.mp_get_im_stats(fn_args, workers)

Read and computes statistics of images with multiprocessing.

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

Returns:

list of returned df from get_im_stats

micro_dl.utils.mp_utils.mp_rescale_vol(fn_args, workers)

Rescale and save image stacks with multiprocessing.

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

micro_dl.utils.mp_utils.mp_resize_save(mp_args, workers)

Resize and save images with multiprocessing.

Parameters:

mp_args (list) – Function keyword arguments
workers (int) – max number of workers

micro_dl.utils.mp_utils.mp_sample_im_pixels(fn_args, workers)

Read and computes statistics of images with multiprocessing.

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

Returns:

list of returned df from get_im_stats

micro_dl.utils.mp_utils.mp_tile_save(fn_args, workers)

Tile and save with multiprocessing https://stackoverflow.com/questions/42074501/python-concurrent-futures-processpoolexecutor-performance-of-submit-vs-map

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

Returns:

list of returned df from tile_and_save

micro_dl.utils.mp_utils.mp_wrapper(fn, fn_args, workers)

Create and save masks with multiprocessing

Parameters:

fn_args (list of tuple) – list with tuples of function arguments
workers (int) – max number of workers

Returns:

list of returned dicts from create_save_mask

micro_dl.utils.mp_utils.rescale_vol_and_save(time_idx, pos_idx, channel_idx, slice_start_idx, slice_end_idx, frames_metadata, dir_name, output_fname, scale_factor, ff_path)

Rescale volumes and save.

Parameters:

time_idx (int) – time point of input image
pos_idx (int) – sample idx of input image
channel_idx (int) – channel idx of input image
slice_start_idx (int) – start slice idx for the vol to be saved
slice_end_idx (int) – end slice idx for the vol to be saved
frames_metadata (pd.Dataframe) – metadata for the input slices
dir_name (str/None) – Image directory (none if using dir_name from frames_meta)
output_fname (str) – output_fname
scale_factor (float/list) – scale factor for resizing
ff_path (str/None) – path to flat field image

micro_dl.utils.mp_utils.resize_and_save(**kwargs)

Resizes images and saving them. Performs flatfield correction prior to resizing if flatfield images are present.

Parameters:: kwargs – Keyword arguments:

str file_path: Path to input image str write_path: Path to image to be written float scale_factor: Scale factor for resizing str ff_path: path to flat field correction image

micro_dl.utils.mp_utils.sample_im_pixels(meta_row, ff_path, grid_spacing, dir_name=None)

Read and computes statistics of images for each point in a grid. Grid spacing determines distance in pixels between grid points for rows and cols. Applies flatfield correction prior to intensity sampling if flatfield path is specified.

Parameters:

meta_row (dict) – Metadata row for image
ff_path (str) – Full path to flatfield image corresponding to image
grid_spacing (int) – Distance in pixels between sampling points
dir_name (str/None) – Image directory (none if using dir_name from frames_meta)

Return list meta_rows:

Dicts with intensity data for each grid point

micro_dl.utils.mp_utils.tile_and_save(meta_sub, flat_field_fname, hist_clip_limits, slice_idx, tile_size, step_size, min_fraction, image_format, save_dir, dir_name=None, int2str_len=3, is_mask=False, normalize_im=None, zscore_mean=None, zscore_std=None)

Crop image into tiles at given indices and save.

Parameters:

meta_sub (pd.DataFrame) – Subset of metadata for images to be tiled
flat_field_fname (str) – fname of flat field image
hist_clip_limits (tuple) – limits for histogram clipping
slice_idx (int) – slice idx of input image
tile_size (list) – size of tile along row, col (& slices)
step_size (list) – step size along row, col (& slices)
min_fraction (float) – min foreground volume fraction for keep tile
image_format (str) – zyx / xyz
save_dir (str) – output dir to save tiles
dir_name (str/None) – Image directory
int2str_len (int) – len of indices for creating file names
is_mask (bool) – Indicates if files are masks
normalize_im (str/None) – Normalization method
zscore_mean (float/None) – Mean for normalization
zscore_std (float/None) – Std for normalization

Returns:

pd.DataFrame from a list of dicts with metadata

micro_dl.utils.network_utils module

micro_dl.utils.network_utils.create_activation_layer(activation_dict)

Get the keras activation / advanced activation

Parameters:: activation_dict (dict) – Nested dict with keys: type -> activation type and params -> dict activation related params such as alpha, theta, alpha_initializer, alpha_regularizer etc from advanced activations
Return keras.layer:: instance of activation layer

micro_dl.utils.network_utils.get_keras_layer(type, num_dims)

Get the 2D or 3D keras layer

Parameters:

stype (str) – type of layer [conv, pooling, upsampling]
num_dims (int) – dimensionality of the image [2 ,3]

Returns:

keras.layer

micro_dl.utils.network_utils.get_layer_shape(layer_shape, data_format)

Get the layer shape without the batch and channel dimensions

Parameters:

layer_shape (list) – output of layer.get_output_shape.as_list()
data_format (str) – in [channels_first, channels_last]

Returns:

np.array layer_shape_xyz - layer shape without batch and channel dimensions

micro_dl.utils.normalize module

Image normalization related functions

micro_dl.utils.normalize.hist_adapteq_2D(input_image, kernel_size=None, clip_limit=None)

CLAHE on 2D images

skimage.exposure.equalize_adapthist works only for 2D. Extend to 3D or use openCV? Not ideal, as it enhances noise in homogeneous areas

Parameters:

input_image (np.array) – input image for intensity normalization
kernel_size (int/list) – Neighbourhood to be used for histogram equalization. If none, use default of 1/8th image size.
clip_limit (float) – Clipping limit, normalized between 0 and 1 (higher values give more contrast, ~ max percent of voxels in any histogram bin, if > this limit, the voxel intensities are redistributed). if None, default=0.01

micro_dl.utils.normalize.hist_clipping(input_image, min_percentile=2, max_percentile=98)

Clips and rescales histogram from min to max intensity percentiles

rescale_intensity with input check

Parameters:

input_image (np.array) – input image for intensity normalization
min_percentile (int/float) – min intensity percentile
max_percentile (int/flaot) – max intensity percentile

Returns:

np.float, intensity clipped and rescaled image

micro_dl.utils.normalize.unzscore(im_norm, zscore_median, zscore_iqr)

Revert z-score normalization applied during preprocessing. Necessary before computing SSIM

Parameters:

im_norm – Normalized image for un-zscore
zscore_median – Image median
zscore_iqr – Image interquartile range

Return im:

image at its original scale

micro_dl.utils.normalize.zscore(input_image, im_mean=None, im_std=None)

Performs z-score normalization. Adds epsilon in denominator for robustness

Parameters:

input_image (np.array) – input image for intensity normalization
im_mean (float/None) – Image mean
im_std (float/None) – Image std

Return np.array norm_img:

z score normalized image

micro_dl.utils.preprocess_utils module

micro_dl.utils.preprocess_utils.get_preprocess_config(data_dir)

micro_dl.utils.preprocess_utils.validate_mask_meta(mask_dir, input_dir, csv_name=None, mask_channel=None)

If user provides existing masks, the mask directory should also contain a csv file (not named frames_meta.csv which is reserved for output) with two column names: mask_name and file_name. Each row should describe the mask name and the corresponding file name. Each file_name should exist in input_dir and belong to the same channel. This function checks that all file names exist in input_dir and writes a frames_meta csv containing mask names with indices corresponding to the matched file_name. It also assigns a mask channel number for future preprocessing steps like tiling.

Parameters:

mask_dir (str) – Mask directory
input_dir (str) – Input image directory, to match masks with images
mask_channel (int/None) – Channel idx assigned to masks

Return int mask_channel:

New channel index for masks for writing tiles

Raises:

IOError: If no csv file is present in mask_dir

Raises:

IOError: If more than one csv file exists in mask_dir and no csv_name is provided to resolve ambiguity

Raises:

AssertionError: If csv doesn’t consist of two columns named ‘mask_name’ and ‘file_name’

Raises:

IndexError: If unable to match file_name in mask_dir csv with file_name in input_dir for any given mask row

micro_dl.utils.tile_utils module

micro_dl.utils.tile_utils.crop_at_indices(input_image, crop_indices, save_dict=None, tile_3d=False)

Crop image into tiles at given indices.

Parameters:

input_image (np.array) – input image for cropping
crop_indices (list/tuple) – list of indices for cropping
save_dict (dict/None) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname
tile_3d (bool) – boolean flag for adding slice_start_idx to meta

Returns:

if not saving tiles: a list with tuples of cropped image id of the format rrmin-rmax_ccmin-cmax_slslmin-slmax and cropped image. Else saves tiles in-place and returns a df with tile metadata

micro_dl.utils.tile_utils.tile_image(input_image, tile_size, step_size, return_index=False, min_fraction=None, save_dict=None)

Tiles the image based on given tile and step size. USE MIN_FRACTION WITH INPUT_IMAGE.DTYPE=bool / MASKS

Parameters:

input_image (np.array) – 3D input image to be tiled
tile_size (list/tuple/np array) – size of the blocks to be tiled from the image
step_size (list/tuple/np array) – size of the window shift. In case of no overlap, the step size is tile_size. If overlap, step_size < tile_size
return_index (bool) – indicator for returning tile indices
min_fraction (float) – Minimum fraction of foreground in mask for including tile
save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname

Returns:

if not saving: a list with tuples of tiled image id of the format rrmin-rmax_ccmin-cmax_slslmin-slmax and tiled image Else: save tiles in-place and return a df with tile metadata if return_index=True: return a list with tuples of crop indices

micro_dl.utils.tile_utils.write_meta(tiled_metadata, save_dict)

Write meta for tiles from an image as a csv

Parameters:

tiled_metadata (list) – list of meta dicts
save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx, pos_idx, image_format and save_dir for generation output fname

Returns:

micro_dl.utils.tile_utils.write_tile(tile, file_name, save_dict)

Write tile function that can be called using threading.

Parameters:

tile (np.array) – one tile
file_name (str) – File name for tile (must be .npy format)
save_dict (dict) – dict with keys: time_idx, channel_idx, slice_idx,

Return str op_fname:

filename used for saving the tile with entire path

micro_dl.utils.train_utils module

Utility functions used for training

micro_dl.utils.train_utils.check_gpu_availability(gpu_id)

Check if mem_frac is available in given gpu_id

Parameters:

gpu_id (int/list) – id of the gpu to be used. Int for single GPU training, list for distributed training
gpu_mem_frac (list) – mem fraction for each GPU in gpu_id

Return bool gpu_availability:

True if all mem_fracs are greater than gpu_mem_frac

Return list curr_mem_frac:

list of current memory fractions available for gpus in to gpu_id

micro_dl.utils.train_utils.get_loss(loss_str): Get loss type from config

micro_dl.utils.train_utils.get_metrics(metrics_list): Get the metrics from config

micro_dl.utils.train_utils.select_gpu(gpu_ids=None, gpu_mem_frac=None)

Find the GPU ID with highest available memory fraction. If ID is given as input, set the gpu_mem_frac to maximum available, or if a memory fraction is given, make sure the given GPU has the desired memory fraction available. Currently only supports single GPU runs.

Parameters:

gpu_ids (int) – Desired GPU ID. If None, find GPU with the most memory available.
gpu_mem_frac (float) – Desired GPU memory fraction [0, 1]. If None, use maximum available amount of GPU.

Return int gpu_ids:

GPU ID to use.

Return float cur_mem_frac:

GPU memory fraction to use

Raises:

NotImplementedError: If gpu_ids is not int

Raises:

AssertionError: If requested memory fraction isn’t available

micro_dl.utils.train_utils.set_keras_session(gpu_ids, gpu_mem_frac): Set the Keras session

Module contents

Module for utility functions