noparama
v0.0.1
Nonparametric Bayesian models
|
#include <membertrix.h>
Public Member Functions | |
membertrix () | |
membertrix (const membertrix &other) | |
membertrix * | clone () |
~membertrix () | |
cluster_id_t | addCluster (cluster_t *cluster) |
cluster_t * | getCluster (cluster_id_t cluster_id) |
data_id_t | addData (data_t &data) |
data_t * | getDatum (data_id_t data_id) |
np_error_t | assign (cluster_id_t cluster_id, data_id_t data_id) |
np_error_t | retract (cluster_id_t cluster_id, data_id_t data_id, bool auto_remove=true) |
np_error_t | retract (data_id_t data_id, bool auto_remove=true) |
np_error_t | remove (cluster_id_t cluster_id) |
bool | assigned (data_id_t data_id) const |
void | getAssignments (cluster_id_t cluster_id, data_ids_t &data_ids) const |
cluster_id_t | getClusterId (data_id_t data_id) const |
const clusters_t & | getClusters () const |
size_t | getClusterCount () const |
void | relabel () |
void | print (cluster_id_t cluster_id, std::ostream &os) const |
void | print (std::ostream &os) const |
dataset_t * | getData () |
dataset_t * | getData (const cluster_id_t cluster_id) const |
void | getData (const cluster_id_t cluster_id, dataset_t &dataset) const |
void | getData (const data_ids_t data_ids, dataset_t &dataset) const |
size_t | count (cluster_id_t cluster_id) const |
size_t | count () const |
bool | empty (cluster_id_t cluster_id) |
int | cleanup () |
membertrix & | operator= (membertrix other) |
Protected Member Functions | |
bool | exists (cluster_id_t cluster_id) |
Friends | |
std::ostream & | operator<< (std::ostream &os, const membertrix &m) |
void | swap (membertrix &first, membertrix &second) |
The membertrix data structure is a binary matrix optimimized for storing membership information. The membership is asymmetric. A data item can be assigned to only one cluster. In contrary, a cluster can have multiple data points as members.
The data points and clusters are stored in separate vectors.
The structure is stored with clusters as columns, and data items as rows. Reasons:
Usage:
References: [1] https://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html
membertrix::membertrix | ( | ) |
The default constructor.
membertrix::membertrix | ( | const membertrix & | other | ) |
The copy constructor. This is not a true copy constructor. If you copy a membership matrix we assume you want to optimize the internal structures. This invalidates all cluster_id's. If an exact clone is required you will need to add a clone() member function.
This constructor calls addData and addCluster to have all internal datastructures consistent and reduce the matrix to the minimum size. The alternative would be all kind of book-keeping swapping columns in the matrix, moving datasets from own cluster to the next, etc.
membertrix::~membertrix | ( | ) |
The destructor.
cluster_id_t membertrix::addCluster | ( | cluster_t * | cluster | ) |
Add a cluster to the membership matrix. The cluster is not physically stored, only a reference is kept. If the memory is deallocated, errors can be expected.
The returned index should be kept as a reference for use in the functions assign() and retract().
[in] | cluster | A cluster object |
Add a data point to the membership matrix. The data are not physically stored, only a reference is kept. If the memory is deallocated, errors can be expected.
The returned index should be kept as a reference for use in the functions assign() and retract().
[in] | data | A data object |
np_error_t membertrix::assign | ( | cluster_id_t | cluster_id, |
data_id_t | data_id | ||
) |
Assign a previously added data item (through addData) to a previously added cluster (through addCluster).
[in] | cluster_id | An index to a cluster object |
[in] | data_id | An index to a data point |
bool membertrix::assigned | ( | data_id_t | data_id | ) | const |
If the data item is assigned to any cluster this function will return true. In all other cases it returns false.
[in] | data_id | An index to a data point |
int membertrix::cleanup | ( | ) |
Clean up internal data structures after data items and clusters have been assigned to remove all clusters that didn't get assigned.
membertrix * membertrix::clone | ( | ) |
size_t membertrix::count | ( | cluster_id_t | cluster_id | ) | const |
Return count of data points within the given cluster.
[in] | cluster_id | An index to a particular cluster |
size_t membertrix::count | ( | ) | const |
Return total number of data points. This should be the same as calling count(cluster_id_t) for each cluster returned by getClusters().
bool membertrix::empty | ( | cluster_id_t | cluster_id | ) |
Indicate if a cluster is empty or non-empty (one or more data items assigned to it).
|
protected |
void membertrix::getAssignments | ( | cluster_id_t | cluster_id, |
data_ids_t & | data_ids | ||
) | const |
Get all assignments to given cluster.
[in] | cluster_id | An index to a cluster object |
[out] | data_ids | Set of data ids |
cluster_t * membertrix::getCluster | ( | cluster_id_t | cluster_id | ) |
Get cluster given cluster id.
[in] | cluster_id | An index to a cluster object |
size_t membertrix::getClusterCount | ( | ) | const |
Get number of clusters. Note that retract and assign adjust the number of clusters!
cluster_id_t membertrix::getClusterId | ( | data_id_t | data_id | ) | const |
Get the cluster id given a particular data id.
[in] | data_id | An index to a data point |
const clusters_t & membertrix::getClusters | ( | ) | const |
Get all clusters to iterate over them. The cluster set is const to protect the user from accidentally removing clusters in a for-loop in a way that destroys the user iterator.
This function can be used to adjust the parameters of the cluster_t objects. The function only reads cluster information and does not change the membertrix instance, hence it is const.
dataset_t * membertrix::getData | ( | ) |
Return all data points.
dataset_t * membertrix::getData | ( | const cluster_id_t | cluster_id | ) | const |
Return all data points that are assigned to a particular cluster.
[in] | cluster_id | An index to a particular cluster |
[out] | dataset | A dataset (vector) of data points that have been assigned through assign() |
void membertrix::getData | ( | const cluster_id_t | cluster_id, |
dataset_t & | dataset | ||
) | const |
void membertrix::getData | ( | const data_ids_t | data_ids, |
dataset_t & | dataset | ||
) | const |
Return a particular subset of data points. Can belong to a particular cluster or not. As long as they have been added through addData(). They do not have to be assigned to a cluster yet.
[in] | data_ids | A set of data point ids |
Return data point with given index.
[in] | data_id | An index to a particular data point |
membertrix & membertrix::operator= | ( | membertrix | other | ) |
The assignment operator is implemented by not passing by reference, but having the argument as a copy. Subsequently, only a swap operation needs to be called.
[in] | membertrix | Another membertrix object |
void membertrix::print | ( | cluster_id_t | cluster_id, |
std::ostream & | os | ||
) | const |
Print cluster to stream.
void membertrix::print | ( | std::ostream & | os | ) | const |
Print entire membership matrix to stream.
void membertrix::relabel | ( | ) |
Aggressive restructuring of all data structures. This will relabel all cluster_id's to consecutive numbers. The assignments are still valid but with different cluster_id's.
This is called automatically on assignments!!
np_error_t membertrix::remove | ( | cluster_id_t | cluster_id | ) |
Remove a cluster (should be empty). This function should be called if auto_remove is set to false in the retract() functions. If auto_remove is set to true (default) there is no need to call remove().
[in] | cluster_id | Index to cluster to be removed |
np_error_t membertrix::retract | ( | cluster_id_t | cluster_id, |
data_id_t | data_id, | ||
bool | auto_remove = true |
||
) |
Retract a previously assigned data-cluster pair (through assign). If the cluster does not have any data points left, also the object will be deallocated depending on auto_remove setting.
[in] | cluster_id | An index to a cluster object |
[in] | data_id | An index to a data point |
[in] | auto_remove | Automatically deallocate cluster object if there is no data assigned anymore |
np_error_t membertrix::retract | ( | data_id_t | data_id, |
bool | auto_remove = true |
||
) |
Retract a previously assigned data-cluster pair (through assign) where the search for this particular cluster is left to getClusterId(data_id). If the cluster does not have any data points left, also the object will be deallocated (depending on auto_remove flag). This has the same effect as:
retract(getClusterId(data_id), data_id);
[in] | data_id | An index to a data point |
[in] | auto_remove | Automatically deallocate cluster object if there is no data assigned anymore |
|
friend |
Allow a membership to be printed to a standard stream using the << operator.
|
friend |
Swap all member fields of the two objects. This is a very lightweight implementation only swapping the five member fields on the level of references, nothing is copied.