lib.catalog.base.BaseCatalog

class lib.catalog.base.BaseCatalog(data=None, columns=None, attrs=None)

Bases: cosmopipe.lib.utils.ScatteredBaseClass

Base class that represents a catalog, as a dictionary of columns stored as arrays.

Initialize BaseCatalog.

Parameters
  • data (dict, BaseCatalog) – Dictionary name: array. If BaseCatalog instance, update self attributes.

  • columns (list, default=None) – List of column names. Defaults to data.keys().

  • attrs (dict) – Other attributes.

Methods

average

Return global average of column(s) column, with weights weights (defaults to 1).

columns

Return catalog column names, after optional selections.

concatenate

Concatenate catalogs together.

copy

Return copy, including column names columns (defaults to all columns).

cov

Estimate weighted covariance.

deepcopy

eval

Evaluate input literal and return results.

extend

Extend catalog with other.

falses

Return array of size size filled with False.

from_array

Build BaseCatalog from input array.

from_nbodykit

Build new catalog from nbodykit.

from_state

Instantiate and initalize class with state dictionary.

full

Return array of size size filled with fill_value.

get

Return catalog (local) column column if exists, else return provided default.

gget

Return on process rank root catalog global column column if exists, else return provided default.

gindices

Row numbers in the global catalog.

gslice

Perform global slicing of catalog, e.g.

is_mpi_broadcast

is_mpi_gathered

is_mpi_root

is_mpi_scattered

load

Load catalog in numpy binary format from disk.

load_auto

Load catalog from disk.

load_fits

Load catalog in fits binary format from disk.

log_critical

log_debug

log_error

log_info

log_warning

maximum

Return global maximum of column(s) column.

mean

Return global mean of column(s) column.

median

Return global median of column(s) column.

minimum

Return global minimum of column(s) column.

mpi_broadcast

mpi_collect

Return new instance corresponding to self on larger mpicomm.

mpi_distribute

Return new instance corresponding to self on smaller mpicomm.

mpi_gather

Gather catalog on a single process.

mpi_recv

Receive catalog from rank source with tag tag.

mpi_scatter

Scatter catalog on all processes.

mpi_send

Send catalog to rank dest with tag tag.

mpi_to_state

Return instance, changing current MPI state to mpistate.

nans

Return array of size size filled with numpy.nan.

ones

Return array of size size filled with one.

percentile

Return global percentiles of column(s) column.

quantile

Return global quantiles of column(s) column.

save

Save class to disk.

save_auto

Write catalog to disk.

save_fits

Save catalog to filename as fits file.

set

Set column of name column.

std

Estimate weigthed standard deviation.

sum

Return global sum of column(s) column.

to_array

Return catalog as numpy array.

to_nbodykit

Return catalog in nbodykit format.

to_stats

Export catalog summary quantities.

trues

Return array of size size filled with True.

var

Estimate weighted parameter variance.

zeros

Return array of size size filled with zero.

Attributes

gsize

Return catalog global size, i.e. sum of size in each process.

logger

mpiattrs

MPI attributes

mpistate

size

Equivalent for __length__().

average(column, weights=None)

Return global average of column(s) column, with weights weights (defaults to 1).

columns(include=None, exclude=None)

Return catalog column names, after optional selections.

Parameters
  • include (list, string, default=None) – Single or list of regex patterns to select column names to include. Defaults to all columns.

  • exclude (list, string, default=None) – Single or list of regex patterns to select column names to exclude. Defaults to no columns.

Returns

columns – Return catalog column names, after optional selections.

Return type

list

classmethod concatenate(*others)

Concatenate catalogs together.

Parameters

others (list) – List of BaseCatalog instances.

Returns

new

Return type

BaseCatalog

Warning

attrs of returned catalog contains, for each key, the last value found in others attrs dictionaries.

copy(columns=None)

Return copy, including column names columns (defaults to all columns).

cov(columns=None, fweights=None, aweights=None, ddof=1)

Estimate weighted covariance.

Parameters
  • columns (list, default=None) – Columns to compute covariance for.

  • fweights (array, int, default=None) – 1D array of integer frequency weights; the number of times each observation vector should be repeated.

  • aweights (array, default=None) – 1D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

  • ddof (int, default=1) – Number of degrees of freedom.

Returns

cov – If single parameter provided as columns, returns variance for that parameter (scalar). Else returns covariance (2D array).

Return type

scalar, array

eval(literal='None')

Evaluate input literal and return results. Python’s eval() is provided access to numpy (np), catalog global size gsize and columns.

extend(other)

Extend catalog with other.

falses()

Return array of size size filled with False.

classmethod from_array(array, columns=None, mpiroot=0, mpistate=0, mpicomm=None, **kwargs)

Build BaseCatalog from input array.

Parameters
  • columns (list) – List of columns to read from array.

  • mpiroot (int, default=0) – Rank of process where input array lives.

  • mpistate (string, mpi.CurrentMPIState) – MPI state of the input array: ‘scattered’, ‘gathered’, ‘broadcast’?

  • mpicomm (MPI communicator, default=None) – MPI communicator.

  • kwargs (dict) – Other arguments for __init__().

Returns

catalog

Return type

BaseCatalog

classmethod from_nbodykit(catalog, columns=None)

Build new catalog from nbodykit.

Parameters
  • catalog (nbodykit.base.catalog.CatalogSource) – nbodykit catalog.

  • columns (list, default=None) – Columns to import. Defaults to all columns.

Returns

catalog

Return type

BaseCatalog

classmethod from_state(state, mpistate=1, mpiroot=0, mpicomm=None)

Instantiate and initalize class with state dictionary.

full(fill_value, dtype=<class 'numpy.float64'>)

Return array of size size filled with fill_value.

get(column, *args, **kwargs)

Return catalog (local) column column if exists, else return provided default.

gget(column, root=None)

Return on process rank root catalog global column column if exists, else return provided default. If root is None or Ellipsis return result on all processes.

gindices()

Row numbers in the global catalog.

property gsize

Return catalog global size, i.e. sum of size in each process.

gslice(*args)

Perform global slicing of catalog, e.g. catalog.gslice(0,100,1) will return a new catalog of global size 100. Same reference to attrs.

classmethod load(*args, **kwargs)

Load catalog in numpy binary format from disk.

classmethod load_auto(filename, *args, **kwargs)

Load catalog from disk.

Parameters
  • filename (string) – File name of catalog. If ends with ‘.fits’, calls load_fits(). Else (numpy binary format), calls load().

  • args (list) – Arguments for load function.

  • kwargs (dict) – Other arguments for load function.

classmethod load_fits(filename, columns=None, ext=None, mpiroot=0, mpistate=0, mpicomm=None)

Load catalog in fits binary format from disk.

Parameters
  • columns (list, default=None) – List of column names to read. Defaults to all columns.

  • ext (int, default=None) – fits extension. Defaults to first extension with data.

  • mpiroot (int, default=0) – Rank of process where input array lives.

  • mpistate (string, mpi.CurrentMPIState) – MPI state of the input array: ‘scattered’, ‘gathered’, ‘broadcast’?

  • mpicomm (MPI communicator, default=None) – MPI communicator.

Returns

catalog

Return type

BaseCatalog

maximum(column)

Return global maximum of column(s) column.

mean(column)

Return global mean of column(s) column.

median(column)

Return global median of column(s) column.

minimum(column)

Return global minimum of column(s) column.

classmethod mpi_collect(self=None, sources=None, mpicomm=None)

Return new instance corresponding to self on larger mpicomm.

Parameters
  • self (object, None) – Instance to spread on mpicomm.

  • sources (list, None) – Ranks of processes of mpicomm where self lives. If None, takes the ranks of processes where self is not None.

  • mpicomm (MPI communicator) – New mpi communicator.

Returns

new

Return type

object

mpi_distribute(dests, mpicomm=None)

Return new instance corresponding to self on smaller mpicomm.

Parameters
  • self (object, None) – Instance to concentrate on mpicomm.

  • dests (list, None) – Ranks of processes of mpicomm where to send self lives. If None, takes the ranks of processes where self is not None.

  • mpicomm (MPI communicator) – New mpi communicator.

Returns

new

Return type

object, None

mpi_gather()

Gather catalog on a single process.

Warning

May blow up memory of the node this process runs on.

mpi_recv(source, tag=42)

Receive catalog from rank source with tag tag.

mpi_scatter()

Scatter catalog on all processes.

mpi_send(dest, tag=42)

Send catalog to rank dest with tag tag.

mpi_to_state(mpistate)

Return instance, changing current MPI state to mpistate.

property mpiattrs

MPI attributes

nans()

Return array of size size filled with numpy.nan.

ones(dtype=<class 'numpy.float64'>)

Return array of size size filled with one.

percentile(column, q=(15.87, 84.13))

Return global percentiles of column(s) column.

quantile(column, q=(0.1587, 0.8413), weights=None)

Return global quantiles of column(s) column.

save(filename)

Save class to disk.

save_auto(filename, *args, **kwargs)

Write catalog to disk.

Parameters
  • filename (string) – File name of catalog. If ends with ‘.fits’, calls save_fits(). Else (numpy binary format), calls save().

  • args (list) – Arguments for save function.

  • kwargs (dict) – Other arguments for save function.

save_fits(filename)

Save catalog to filename as fits file. Possible to change fitsio to write by chunks?.

set(column, item)

Set column of name column.

property size

Equivalent for __length__().

std(column, **kwargs)

Estimate weigthed standard deviation. Same arguments as var().

sum(column)

Return global sum of column(s) column.

to_array(columns=None, struct=True)

Return catalog as numpy array.

Parameters
  • columns (list, default=None) – Columns to use. Defaults to all catalog columns.

  • struct (bool, default=True) – Whether to return structured array, with columns accessible through e.g. array['Position']. If False, numpy will attempt to cast types of different columns.

Returns

array

Return type

array

to_nbodykit(columns=None)

Return catalog in nbodykit format.

Parameters

columns (list, default=None) – Columns to export. Defaults to all columns.

Returns

catalog

Return type

nbodykit.base.catalog.CatalogSource

to_stats(columns=None, quantities=None, sigfigs=2, tablefmt='latex_raw', filename=None)

Export catalog summary quantities.

Parameters
  • columns (list, default=None) – Columns to export quantities for. Defaults to all columns.

  • quantities (list, default=None) – Quantities to export. Defaults to ['mean','median','std'].

  • sigfigs (int, default=2) – Number of significant digits. See utils.round_measurement().

  • tablefmt (string, default='latex_raw') – Format for summary table. See tabulate.tabulate().

  • filename (string default=None) – If not None, file name where to save summary table.

Returns

tab – Summary table.

Return type

string

trues()

Return array of size size filled with True.

var(column, fweights=None, aweights=None, ddof=1)

Estimate weighted parameter variance.

Parameters
  • columns (list, default=None) – Columns to compute variance for.

  • fweights (array, int, default=None) – 1D array of integer frequency weights; the number of times each observation vector should be repeated.

  • aweights (array, default=None) – 1D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

Returns

var – If single parameter provided as columns, returns variance for that parameter (scalar). Else returns variance array.

Return type

scalar, array

zeros(dtype=<class 'numpy.float64'>)

Return array of size size filled with zero.