lib.catalog.base.BaseCatalog¶

class lib.catalog.base.BaseCatalog(data=None, columns=None, attrs=None)¶

Bases: cosmopipe.lib.utils.ScatteredBaseClass

Base class that represents a catalog, as a dictionary of columns stored as arrays.

Initialize BaseCatalog.

Parameters

data (dict, BaseCatalog) – Dictionary name: array. If BaseCatalog instance, update self attributes.
columns (list, default=None) – List of column names. Defaults to data.keys().
attrs (dict) – Other attributes.

Methods

`average`	Return global average of column(s) `column`, with weights `weights` (defaults to `1`).
`columns`	Return catalog column names, after optional selections.
`concatenate`	Concatenate catalogs together.
`copy`	Return copy, including column names `columns` (defaults to all columns).
`cov`	Estimate weighted covariance.
`deepcopy`
`eval`	Evaluate input `literal` and return results.
`extend`	Extend catalog with `other`.
`falses`	Return array of size `size` filled with `False`.
`from_array`	Build `BaseCatalog` from input `array`.
`from_nbodykit`	Build new catalog from nbodykit.
`from_state`	Instantiate and initalize class with state dictionary.
`full`	Return array of size `size` filled with `fill_value`.
`get`	Return catalog (local) column `column` if exists, else return provided default.
`gget`	Return on process rank `root` catalog global column `column` if exists, else return provided default.
`gindices`	Row numbers in the global catalog.
`gslice`	Perform global slicing of catalog, e.g.
`is_mpi_broadcast`
`is_mpi_gathered`
`is_mpi_root`
`is_mpi_scattered`
`load`	Load catalog in numpy binary format from disk.
`load_auto`	Load catalog from disk.
`load_fits`	Load catalog in fits binary format from disk.
`log_critical`
`log_debug`
`log_error`
`log_info`
`log_warning`
`maximum`	Return global maximum of column(s) `column`.
`mean`	Return global mean of column(s) `column`.
`median`	Return global median of column(s) `column`.
`minimum`	Return global minimum of column(s) `column`.
`mpi_broadcast`
`mpi_collect`	Return new instance corresponding to `self` on larger `mpicomm`.
`mpi_distribute`	Return new instance corresponding to `self` on smaller `mpicomm`.
`mpi_gather`	Gather catalog on a single process.
`mpi_recv`	Receive catalog from rank `source` with tag `tag`.
`mpi_scatter`	Scatter catalog on all processes.
`mpi_send`	Send catalog to rank `dest` with tag `tag`.
`mpi_to_state`	Return instance, changing current MPI state to `mpistate`.
`nans`	Return array of size `size` filled with `numpy.nan`.
`ones`	Return array of size `size` filled with one.
`percentile`	Return global percentiles of column(s) `column`.
`quantile`	Return global quantiles of column(s) `column`.
`save`	Save class to disk.
`save_auto`	Write catalog to disk.
`save_fits`	Save catalog to `filename` as fits file.
`set`	Set column of name `column`.
`std`	Estimate weigthed standard deviation.
`sum`	Return global sum of column(s) `column`.
`to_array`	Return catalog as numpy array.
`to_nbodykit`	Return catalog in nbodykit format.
`to_stats`	Export catalog summary quantities.
`trues`	Return array of size `size` filled with `True`.
`var`	Estimate weighted parameter variance.
`zeros`	Return array of size `size` filled with zero.

Attributes

`gsize`	Return catalog global size, i.e. sum of size in each process.
`logger`
`mpiattrs`	MPI attributes
`mpistate`
`size`	Equivalent for `__length__()`.

average(column, weights=None)¶: Return global average of column(s) column, with weights weights (defaults to 1).

columns(include=None, exclude=None)¶

Return catalog column names, after optional selections.

Parameters

include (list, string, default=None) – Single or list of regex patterns to select column names to include. Defaults to all columns.
exclude (list, string, default=None) – Single or list of regex patterns to select column names to exclude. Defaults to no columns.

Returns

columns – Return catalog column names, after optional selections.

Return type

list

classmethod concatenate(*others)¶

Concatenate catalogs together.

Parameters: others (list) – List of BaseCatalog instances.
Returns: new
Return type: BaseCatalog

Warning

attrs of returned catalog contains, for each key, the last value found in others attrs dictionaries.

copy(columns=None)¶: Return copy, including column names columns (defaults to all columns).

cov(columns=None, fweights=None, aweights=None, ddof=1)¶

Estimate weighted covariance.

Parameters

columns (list, default=None) – Columns to compute covariance for.
fweights (array, int, default=None) – 1D array of integer frequency weights; the number of times each observation vector should be repeated.
aweights (array, default=None) – 1D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.
ddof (int, default=1) – Number of degrees of freedom.

Returns

cov – If single parameter provided as columns, returns variance for that parameter (scalar). Else returns covariance (2D array).

Return type

scalar, array

eval(literal='None')¶: Evaluate input literal and return results. Python’s eval() is provided access to numpy (np), catalog global size gsize and columns.

extend(other)¶: Extend catalog with other.

falses()¶: Return array of size size filled with False.

classmethod from_array(array, columns=None, mpiroot=0, mpistate=0, mpicomm=None, **kwargs)¶

Build BaseCatalog from input array.

Parameters

columns (list) – List of columns to read from array.
mpiroot (int, default=0) – Rank of process where input array lives.
mpistate (string, mpi.CurrentMPIState) – MPI state of the input array: ‘scattered’, ‘gathered’, ‘broadcast’?
mpicomm (MPI communicator, default=None) – MPI communicator.
kwargs (dict) – Other arguments for __init__().

Returns

catalog

Return type

BaseCatalog

classmethod from_nbodykit(catalog, columns=None)¶

Build new catalog from nbodykit.

Parameters

catalog (nbodykit.base.catalog.CatalogSource) – nbodykit catalog.
columns (list, default=None) – Columns to import. Defaults to all columns.

Returns

catalog

Return type

BaseCatalog

classmethod from_state(state, mpistate=1, mpiroot=0, mpicomm=None)¶: Instantiate and initalize class with state dictionary.

full(fill_value, dtype=<class 'numpy.float64'>)¶: Return array of size size filled with fill_value.

get(column, *args, **kwargs)¶: Return catalog (local) column column if exists, else return provided default.

gget(column, root=None)¶: Return on process rank root catalog global column column if exists, else return provided default. If root is None or Ellipsis return result on all processes.

gindices()¶: Row numbers in the global catalog.

property gsize¶: Return catalog global size, i.e. sum of size in each process.

gslice(*args)¶: Perform global slicing of catalog, e.g. catalog.gslice(0,100,1) will return a new catalog of global size 100. Same reference to attrs.

classmethod load(*args, **kwargs)¶: Load catalog in numpy binary format from disk.

classmethod load_auto(filename, *args, **kwargs)¶

Load catalog from disk.

Parameters

filename (string) – File name of catalog. If ends with ‘.fits’, calls load_fits(). Else (numpy binary format), calls load().
args (list) – Arguments for load function.
kwargs (dict) – Other arguments for load function.

classmethod load_fits(filename, columns=None, ext=None, mpiroot=0, mpistate=0, mpicomm=None)¶

Load catalog in fits binary format from disk.

Parameters

columns (list, default=None) – List of column names to read. Defaults to all columns.
ext (int, default=None) – fits extension. Defaults to first extension with data.
mpiroot (int, default=0) – Rank of process where input array lives.
mpistate (string, mpi.CurrentMPIState) – MPI state of the input array: ‘scattered’, ‘gathered’, ‘broadcast’?
mpicomm (MPI communicator, default=None) – MPI communicator.

Returns

catalog

Return type

BaseCatalog

maximum(column)¶: Return global maximum of column(s) column.

mean(column)¶: Return global mean of column(s) column.

median(column)¶: Return global median of column(s) column.

minimum(column)¶: Return global minimum of column(s) column.

classmethod mpi_collect(self=None, sources=None, mpicomm=None)¶

Return new instance corresponding to self on larger mpicomm.

Parameters

self (object, None) – Instance to spread on mpicomm.
sources (list, None) – Ranks of processes of mpicomm where self lives. If None, takes the ranks of processes where self is not None.
mpicomm (MPI communicator) – New mpi communicator.

Returns

new

Return type

object

mpi_distribute(dests, mpicomm=None)¶

Return new instance corresponding to self on smaller mpicomm.

Parameters

self (object, None) – Instance to concentrate on mpicomm.
dests (list, None) – Ranks of processes of mpicomm where to send self lives. If None, takes the ranks of processes where self is not None.
mpicomm (MPI communicator) – New mpi communicator.

Returns

new

Return type

object, None

mpi_gather()¶: Gather catalog on a single process.

Warning

May blow up memory of the node this process runs on.

mpi_recv(source, tag=42)¶: Receive catalog from rank source with tag tag.

mpi_scatter()¶: Scatter catalog on all processes.

mpi_send(dest, tag=42)¶: Send catalog to rank dest with tag tag.

mpi_to_state(mpistate)¶: Return instance, changing current MPI state to mpistate.

property mpiattrs¶: MPI attributes

nans()¶: Return array of size size filled with numpy.nan.

ones(dtype=<class 'numpy.float64'>)¶: Return array of size size filled with one.

percentile(column, q=(15.87, 84.13))¶: Return global percentiles of column(s) column.

quantile(column, q=(0.1587, 0.8413), weights=None)¶: Return global quantiles of column(s) column.

save(filename)¶: Save class to disk.

save_auto(filename, *args, **kwargs)¶

Write catalog to disk.

Parameters

filename (string) – File name of catalog. If ends with ‘.fits’, calls save_fits(). Else (numpy binary format), calls save().
args (list) – Arguments for save function.
kwargs (dict) – Other arguments for save function.

save_fits(filename)¶: Save catalog to filename as fits file. Possible to change fitsio to write by chunks?.

set(column, item)¶: Set column of name column.

property size¶: Equivalent for __length__().

std(column, **kwargs)¶: Estimate weigthed standard deviation. Same arguments as var().

sum(column)¶: Return global sum of column(s) column.

to_array(columns=None, struct=True)¶

Return catalog as numpy array.

Parameters

columns (list, default=None) – Columns to use. Defaults to all catalog columns.
struct (bool, default=True) – Whether to return structured array, with columns accessible through e.g. array['Position']. If False, numpy will attempt to cast types of different columns.

Returns

array

Return type

array

to_nbodykit(columns=None)¶

Return catalog in nbodykit format.

Parameters: columns (list, default=None) – Columns to export. Defaults to all columns.
Returns: catalog
Return type: nbodykit.base.catalog.CatalogSource

to_stats(columns=None, quantities=None, sigfigs=2, tablefmt='latex_raw', filename=None)¶

Export catalog summary quantities.

Parameters

columns (list, default=None) – Columns to export quantities for. Defaults to all columns.
quantities (list, default=None) – Quantities to export. Defaults to ['mean','median','std'].
sigfigs (int, default=2) – Number of significant digits. See utils.round_measurement().
tablefmt (string, default='latex_raw') – Format for summary table. See tabulate.tabulate().
filename (string default=None) – If not None, file name where to save summary table.

Returns

tab – Summary table.

Return type

string

trues()¶: Return array of size size filled with True.

var(column, fweights=None, aweights=None, ddof=1)¶

Estimate weighted parameter variance.

Parameters

columns (list, default=None) – Columns to compute variance for.
fweights (array, int, default=None) – 1D array of integer frequency weights; the number of times each observation vector should be repeated.
aweights (array, default=None) – 1D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

Returns

var – If single parameter provided as columns, returns variance for that parameter (scalar). Else returns variance array.

Return type

scalar, array

zeros(dtype=<class 'numpy.float64'>)¶: Return array of size size filled with zero.