API Reference
This section provides a detailed reference for the Nexus-CAT API. For a complete list of classes and methods, refer to the source code.
main.py
main function
Run the full analysis pipeline. Executes the sequential workflow: scan trajectory file, iterate frames, find neighbors, build clusters via union-find, run enabled analyzers, and write results to the export directory. Performance metrics are optionally recorded at each step.
Parameters:
settings (Settings): Configuration object containing parameters for project setup, verbosity, frame ranges, analysis options, export paths, logging, and performance tracking.
Descriptions
Initializes a performance tracker with a unique ID and timestamped run name if enabled.
Sets up resource tracking including memory and CPU usage if
performanceis enabled.Prints version title and detailed settings if
verbosityis enabled.Creates necessary export directories.
Saves initial logs if
save_logsis enabled.Initializes data reader and system objects; measures reader initialization time.
Determines the total number of frames to process based on the settings.
Creates analyzer instances based on the analysis configuration.
Uses
tqdmprogress bar for visual feedback during frame processing.For each frame:
Applies custom lattice if configured.
Initializes nodes and performs neighbor finding using a strategy factory.
Builds clusters from the found neighbors.
Runs analysis on clusters.
Optionally prints unwrapped clusters.
Tracks and records time for each step and resource usage every 10 frames or on completion if
performanceis enabled.
After processing frames:
Calls analyzers to print results to file.
Collects overall performance metrics including memory usage changes.
Saves overall performance metrics if enabled.
Behavior and Side Effects
Creates and manages directories and files for output.
Utilizes multiple factory classes (
ReaderFactory,StrategyFactory,AnalyzerFactory,WriterFactory) to modularize component creation.Uses
psutilto obtain CPU and memory usage stats.Tracks performance per frame and aggregates metrics like average times and node counts.
Saves logs, cluster data, and performance data conditionally based on settings.
Usage Example
from nexus import SettingsBuilder, main
import nexus.config.settings as c
# General settings
config_general = c.GeneralSettings(
...
)
# Lattice settings
config_lattice = c.LatticeSettings(
...
)
# Clustering settings
config_clustering = c.ClusteringSettings(
...
)
# Analysis settings
config_analysis = c.AnalysisSettings(
...
)
# Build Settings object
settings = (
SettingsBuilder()
.with_general(config_general) # General settings \
.with_lattice(config_lattice) # Lattice settings \
.with_clustering(config_clustering) # Clustering settings \
.with_analysis(config_analysis) # Analysis settings \
.build() # Build the settings object
)
# Run main processing function
main(settings)
config Module (Settings)
GeneralSettings
General configuration parameters for the analysis pipeline.
Attributes
project_name(str): Name of the project. Default:"Project".export_directory(str): Directory path to export results. Default:"exports".file_location(str): Path to the trajectory file. Default:"".range_of_frames(Tuple[int, int]): Frame range to process;(0, -1)means all frames. Default:(0, -1).apply_pbc(bool): Whether to apply periodic boundary conditions. Default:False.verbose(bool): Whether to print detailed info and progress bars. Default:False.save_logs(bool): Whether to save logs during runs. Default:False.save_performance(bool): Whether to save performance data. Default:False.
Cutoff
Distance cutoff between two node types for neighbor searching.
Attributes
type1(str): Symbol of the first node type.type2(str): Symbol of the second node type.distance(float): Maximum distance for this pair to be considered neighbors.
Methods
__str__(): Returns formatted string representation.get_distance() -> float: Returns the cutoff distance.
ClusteringSettings
Configuration parameters for the clustering algorithm. Controls which clustering strategy is selected, the neighbor search method, node filtering, cutoff distances, and optional coordination or shared-neighbor analysis modes.
Attributes
criterion(str): Clustering criterion, either"distance"or"bond". Default:"distance".neighbor_searcher(str): Method for neighbor searching, e.g.,"kd_tree". Default:"kd_tree".node_types(List[str]): List of node types.node_masses(List[float]): List of node masses in reduced units.connectivity(List[str]): Connectivity specifications.cutoffs(List[Cutoff]): List of cutoff distances.with_printed_unwrapped_clusters(bool): Flag to print unwrapped clusters. Default:False.print_mode(str): Print mode among"all","connectivity","individual", or"none". Default:"none".with_coordination_number(bool): Flag to enable coordination number constraints. Default:False.coordination_mode(str): Mode for coordination number calculation, either:"all_types": counts all the neighbors regardless of type."same_type": counts only neighbors of the same type as the central node."different_type": counts only neighbors of different types than the central node.<node_type>: counts only neighbors of the specified node type, i.e., ‘O’, ‘Si’ etc. Default:"all_types".
coordination_range(List[int]): Minimum and maximum coordination numbers to consider. Default:[]. (e.g.,[4, 6])with_pairwise(bool): Flag to enable pairwise coordination number constraints. Default:False.with_mixing(bool): Flag to enable mixing coordination number constraints. Default:False.with_alternating(bool): Flag to enable alternating coordination number constraints. Default:False.with_connectivity_name(str): Name of the connectivity to apply coordination number constraints to. Default:"".with_number_of_shared(bool): Flag to enable shared neighbor constraints. Default:False.shared_mode(str): Mode for shared neighbor calculation, either:"all_types": counts all shared neighbors regardless of type."same_type": counts only shared neighbors of the same type as the central node."different_type": counts only shared neighbors of different types than the central node.<node_type>: counts only shared neighbors of the specified node type, i.e., ‘O’, ‘Si’ etc. Default:"all_types".
shared_threshold(int): Minimum number of shared neighbors required. Default:1.shared_threshold_mode(str): Mode for shared neighbor threshold behavior, either:"exact": requires exactly the specified number of shared neighbors to account the connectivity between the nodes."minimum": requires at least the specified number of shared neighbors to account the connectivity between the nodes."maximum": requires at most the specified number of shared neighbors to account the connectivity between the nodes. Default:"exact".
Methods
get_cutoff(type1: str, type2: str) -> Optional[float]: Returns the cutoff distance between two node types orNone.__str__(): Returns a formatted multi-line string detailing the clustering settings.
AnalysisSettings
Configuration parameters controlling which analyzers are enabled. Each with_* flag enables the corresponding analyzer in the pipeline. Setting with_all enables every available analyzer at once.
Attributes
Boolean flags per analysis type, e.g.,
with_average_cluster_size,with_largest_cluster_size,with_concentration,with_gyration_radius, etc.overwrite(bool): Whether to overwrite existing output files. Default:True.with_all(bool): Enables all analysis flags at once. Default:False.
Methods
get_analyzers() -> List[str]: Returns list of analyzer class names based on active flags.__str__(): Returns formatted string reporting enabled analyzers.
LatticeSettings
Configuration parameters for the simulation cell lattice. Controls whether a custom lattice is applied, the lattice matrix values, and whether the lattice is read from an external file.
Attributes
lattice(np.ndarray): Base lattice matrix. Default zero 3x3 matrix.apply_custom_lattice(bool): Whether to apply a user-defined lattice. Default:False.custom_lattice(np.ndarray): Custom lattice matrix. Default zero 3x3 matrix.get_lattice_from_file(bool): Whether to load lattice from a file. Default:False.lattice_file_location(str): Path to lattice file. Default:"./".apply_lattice_to_all_frames(bool): Whether to apply lattice across all frames. Default:True.
Methods
__str__(): Returns detailed string representation including lattice matrices.
Settings
Composite configuration object for the entire analysis pipeline. Aggregates general, lattice, clustering, and analysis sub-settings into a single object. Constructed via SettingsBuilder which validates constraints between fields.
Attributes
project_name(str): Project identifier. Default:"default".export_directory(str): Base directory for exports. Default:"export".file_location(str): Path of trajectory file. Default:"./".range_of_frames(Tuple[int, int]): Frame processing range. Default:(0, -1).apply_pbc(bool): Apply periodic boundary conditions. Default:True.verbose,save_logs,save_performance(bool): Flags for verbosity and saving outputs.Sub-settings as dataclass fields:
general(GeneralSettings)lattice(LatticeSettings)clustering(ClusteringSettings)analysis(AnalysisSettings)
Properties
output_directory(str): Combines export directory and project name.
Methods
set_range_of_frames(start: int, end: Optional[int] = None): Sets the frame range with validation.__str__(): Returns formatted string summarizing major settings (excluding general).
SettingsBuilder
Builder for constructing and validating a Settings object. Provides a fluent interface for setting sub-configurations. Each with_* method validates its input and returns self for chaining. Call build() to obtain the final Settings instance.
Methods
with_lattice(lattice: LatticeSettings) -> SettingsBuilder: Sets lattice settings.with_general(general: GeneralSettings) -> SettingsBuilder: Sets general settings with validation.with_analysis(analysis: AnalysisSettings) -> SettingsBuilder: Sets analysis settings.with_clustering(clustering: ClusteringSettings) -> SettingsBuilder: Sets clustering settings with validations on criteria, connectivities, coordination number modes, flags consistency, and list lengths.build() -> Settings: Returns the validated, constructedSettingsinstance.
Usage Example
from nexus import SettingsBuilder, main
import nexus.config.settings as c
general = c.GeneralSettings(
project_name="MyProject",
export_directory="results",
file_location="trajectory.xyz",
range_of_frames=(0, 100),
apply_pbc=True,
verbose=True,
save_logs=True,
save_performance=True
)
lattice = c.LatticeSettings(
apply_custom_lattice=True,
custom_lattice = np.eye(3)*10.0
)
clustering = c.ClusteringSettings(
criterion="distance",
node_types=["A", "B"],
node_masses=[1.0, 1.5],
connectivity=["A-B", "B-A"],
cutoffs=[c.Cutoff("A", "B", 3.5)],
with_printed_unwrapped_clusters=True,
print_mode="all"
)
analysis = c.AnalysisSettings(
with_average_cluster_size=True,
with_concentration=True,
with_all=False
)
settings = (SettingsBuilder()
.with_general(general)
.with_lattice(lattice)
.with_clustering(clustering)
.with_analysis(analysis)
.build()
)
This reference summarizes all the key settings classes and the builder for configuring the package’s runtime behavior.
core Module
Node
Dataclass representing a node in the simulation frame. A Node is the fundamental unit of the cluster analysis pipeline. It stores properties (symbol, position, mass) and participates in the union-find algorithm through its parent field. Nodes self-parent by default, meaning each node starts as the root of its own cluster.
Attributes
symbol(str): Chemical symbol of the atom (e.g., “Si”, “O”).node_id(int): Unique identifier (auto-incremented if not provided).position(np.ndarray): 3D coordinates of the node.parent(Optional[Node]): Reference to parent node; defaults to self if not set.neighbors(List[Node]): List of neighboring nodes.cluster_id(Optional[int]): Identifier of the cluster the node belongs to; defaults to its own node_id.distances(Optional[List[float]]): Distances to each neighbor.indices(Optional[List[int]]): Indices corresponding to neighbors.mass(Optional[float]): Mass of the node; defaults to 1.0 if not set.coordination(Optional[int]): Coordination number; defaults to 0.other(Optional[List[str]]): List for additional arbitrary attributes.
Initialization
Auto-assigns default values for optional attributes after creation:
Default position as zero vector if none provided.
Auto-incrementation of
node_idif none assigned.Defaults for mass, coordination, parent, cluster_id, neighbors, and other attributes.
Methods
wrap_position(position: np.ndarray, lattice: np.ndarray) -> np.ndarrayStatic method.
Wraps a given position vector inside a periodic box defined by the lattice matrix.
add_neighbor(node: Node) -> NoneAdds a node to the neighbors list.
reset_parent() -> NoneResets the parent attribute to self.
set_coordination(coordination: int) -> NoneSets the coordination number.
__str__() -> strReturns a readable string representation showing node id, symbol, coordination, neighbor count, and position.
__repr__() -> strSame as
__str__for convenience.
Here’s the API reference for the Cluster class in the core module:
Cluster
Represents a group of connected nodes forming a single cluster. A cluster is produced by a clustering strategy and consumed by analyzers. It holds the set of member nodes and provides methods to compute physical properties such as unwrapped positions, center of mass, gyration radius, percolation probability, and order parameter.
Initialization
Cluster(connectivity: str, root_id: int, size: int, settings: Settings, lattice: np.ndarray)
Creates a cluster with a connectivity label, root node ID, size, settings, and lattice matrix.
Parameters
connectivity(str): Connectivity descriptor (e.g.,"Si-O-Si"or"Si4-Si4").root_id(int): ID of the root node defining the cluster.size(int): Number of nodes in the cluster.settings(Settings): Configuration settings for the analysis.lattice(np.ndarray): 3×3 lattice matrix defining the periodic simulation box.
Attributes
Basic Properties:
nodes(List[Node]): List of nodes belonging to the cluster.connectivity(str): Connectivity descriptor identifying cluster type.root_id(int): Unique identifier of the cluster’s root node.size(int): Number of nodes in the cluster.settings(Settings): Configuration settings.lattice(np.ndarray): 3×3 lattice matrix.symbols(list): Element symbols of cluster nodes.indices(list): Global node IDs._inv_lattice(np.ndarray): Inverse of the lattice matrix for coordinate transformations._all_connectivities(Set[str]): Set of all connectivity types in the analysis.
Structural Properties:
center_of_mass(np.ndarray): Wrapped center of mass position in Cartesian coordinates.unwrapped_positions(np.ndarray): Node positions unwrapped across periodic boundaries.gyration_radius(float): Radius of gyration quantifying spatial extent.
Percolation Properties:
percolation_probability(str): Directions in which cluster percolates (e.g.,"x","xy","xyz").order_parameter(list): Order parameters[P∞_x, P∞_y, P∞_z]for each dimension.is_percolating(bool):Trueif cluster percolates in all three dimensions.is_spanning(bool):Trueif cluster is the largest in its connectivity type.
Connectivity Data:
linkages(List[Tuple[int, int]]): List of node ID pairs representing bonds between networking nodes._linkage_set(Set[Tuple[int, int]]): Internal set for efficient linkage tracking during unwrapping.period_vectors(List[np.ndarray]): Period vectors detected during BFS unwrapping, representing displacements across periodic boundaries. Empty for non-periodic clusters.decoration_atoms(Dict[int, Dict]): Dictionary of decorating nodes (e.g., bridging atoms in bond-based clustering). Keys are node IDs, values containsymbol,position, andcoordination.
Statistical Properties:
total_nodes(int): Total number of networking nodes in the system (for normalization).concentration(float): Fractional concentration:size / total_nodes.
Methods
add_node(node: Node) -> NoneAdds a node to the cluster and assigns the cluster ID to the node.
Parameters:
node(Node): Node object to add.
Returns: None
set_lattice(lattice: np.ndarray) -> NoneUpdates the lattice matrix and its inverse.
Parameters:
lattice(np.ndarray): New 3×3 lattice matrix.
Returns: None
get_nodes() -> List[Node]Returns the list of nodes in the cluster.
Returns: List of
Nodeobjects.
get_connectivity() -> strReturns the connectivity descriptor string.
Returns: Connectivity descriptor.
get_size() -> intReturns the number of nodes in the cluster.
Returns: Cluster size (integer).
set_indices_and_positions(positions_dict: Dict[int, np.ndarray]) -> NoneSets node indices, symbols, and unwrapped positions from a dictionary.
Parameters:
positions_dict(Dict[int, np.ndarray]): Mapping of node IDs to unwrapped positions.
Returns: None
Behavior: Populates
symbols,indices, andunwrapped_positionsattributes.
calculate_center_of_mass() -> NoneCalculates the cluster’s center of mass in unwrapped coordinates.
Returns: None
Behavior:
Computes mean of unwrapped positions.
Wraps center of mass back into primary periodic cell.
Translates all unwrapped positions and decoration atoms to center the cluster around the wrapped COM.
Updates
center_of_massattribute.
calculate_gyration_radius() -> NoneComputes the radius of gyration about the center of mass.
Returns: None
Formula: $R_g = \sqrt{\frac{1}{N} \sum_{i=1}^{N} |\mathbf{r}i - \mathbf{r}{\text{COM}}|^2}$
Behavior: Returns 0.0 for clusters with size ≤ 1.
calculate_percolation_probability() -> NoneDetermines percolation using the period vector algorithm.
Returns: None
Method: Period vector detection following Livraghi et al. (2021) J. Chem. Theory Comput.
Behavior:
Uses pre-computed
period_vectorscollected duringcalculate_unwrapped_positions(), avoiding a redundant BFS traversal.Calculates percolation dimension from linear independence of period vectors.
Updates
percolation_probability(e.g.,"xyz") andis_percolatingflag.
Percolation Theory Significance: True percolation requires the cluster to connect to itself across periodic boundaries through actual bond connectivity, not just spatial extent. This method distinguishes genuine percolating clusters from large finite clusters.
_calculate_period_dimension() -> intCalculates the algebraic dimension of the period vector set using singular value decomposition.
Returns: Integer (0, 1, 2, or 3) indicating percolation dimensionality.
Method: Reads from
self.period_vectors. Computes rank of period vector matrix via SVD with numerical tolerance.
_get_percolation_directions() -> strDetermines which Cartesian directions (x, y, z) the cluster percolates in.
Returns: String containing percolating directions (e.g.,
"xy"for percolation in x and y).Behavior: Reads from
self.period_vectors. Checks which directions have significant fractional components (> 0.5 lattice units) in linearly independent period vectors.
_get_independent_periods(period_vectors: List[np.ndarray]) -> List[np.ndarray]Extracts linearly independent period vectors.
Parameters:
period_vectors(List[np.ndarray]): All detected period vectors.
Returns: List of up to 3 linearly independent period vectors.
Method: Uses incremental matrix rank calculation to select independent vectors.
calculate_order_parameter() -> NoneCalculates order parameters for each dimension based on percolation status.
Returns: None
Formula: $P_\infty = \frac{N_{\text{cluster}}}{N_{\text{total}}}$ for percolating dimensions, 0 otherwise.
Behavior: Assigns
[P∞, 0, 0]for 1D percolation,[P∞, P∞, 0]for 2D, and[P∞, P∞, P∞]for 3D.
calculate_concentration() -> NoneComputes the cluster concentration relative to total networking nodes.
Returns: None
Formula: $c = \frac{N_{\text{cluster}}}{N_{\text{total}}}$
Behavior: Updates
concentrationattribute.
_unwrap_vector(vector: np.ndarray) -> np.ndarrayApplies minimum image convention to unwrap a displacement vector.
Parameters:
vector(np.ndarray): Displacement vector to unwrap.
Returns: Unwrapped vector in Cartesian coordinates.
Method: Converts to fractional coordinates, subtracts nearest integer, converts back to Cartesian.
calculate_unwrapped_positions() -> NoneUnwraps all cluster node positions relative to the root node using breadth-first traversal, and simultaneously detects period vectors for percolation analysis.
Returns: None
Behavior:
Performs a single BFS starting from root node.
For each neighbor, calculates unwrapped position by adding minimum image displacement to current position.
When encountering a previously visited node through a different path, records the displacement as a period vector.
Records linkages between nodes.
For bond-based clustering, identifies and unwraps decorating atoms (e.g., bridging nodes).
Displays progress bar if verbose mode enabled.
Handles both
"distance"and"bond"clustering criteria.
Output: Populates
unwrapped_positions,linkages,period_vectors, anddecoration_atomsattributes.
__str__() -> strReturns a concise string representation of the cluster.
Format:
"{root_id} {connectivity} {size} {is_percolating} {node_ids}"Returns: Human-readable cluster summary (truncated to 20 node IDs).
__repr__() -> strReturns the same representation as
__str__().
Frame
Represents a single snapshot of a trajectory. A frame holds the raw node data, the simulation cell lattice, and the clusters produced by a clustering strategy. It is created by a reader, populated during the analysis pipeline, and consumed by analyzers.
Attributes
frame_id(int): Unique identifier of the frame.nodes(List[Node]): List of nodes contained in this frame.lattice(np.ndarray): 3x3 lattice matrix defining the periodic box._data(Dict[str, np.ndarray]): Internal data dictionary holding node properties such as symbols and positions._settings(Settings): Settings object with configuration options affecting the frame.clusters(Optional[List[Cluster]]): List of clusters detected in the frame.connectivities(Optional[List[str]]): List of connectivity descriptors associated with clusters.
Initialization and Validation
__post_init__(): Ensures thatnodesis a list andlatticeis a 3x3 numpy array.
Methods
initialize_nodes() -> NoneInitializes the node list from
_data, filtering by node types specified in the clustering settings.Assigns node IDs sequentially starting at zero.
set_lattice(lattice: np.ndarray) -> NoneSets the frame lattice matrix.
Validates that input lattice is a 3x3 non-singular numpy array.
get_lattice() -> Optional[np.ndarray]Returns the lattice matrix.
get_unique_elements() -> List[str]Returns a list of unique node symbols present in the frame.
get_node_by_id(node_id: int) -> Optional[Node]Finds and returns a node by its unique node ID, or None if not found.
get_positions() -> np.ndarrayReturns an array of all node positions in the frame.
get_positions_by_element() -> Dict[str, np.ndarray]Groups node positions by their element symbol and returns a dictionary mapping symbol to position arrays.
get_wrapped_positions() -> np.ndarrayReturns positions wrapped inside the periodic lattice box using utility wrapping function.
get_wrapped_positions_by_element() -> Dict[str, np.ndarray]Returns wrapped positions grouped by element symbol.
get_clusters() -> List[Cluster]Returns the list of clusters in the frame.
get_nodes() -> List[Node]Returns the list of nodes in the frame.
get_networking_nodes() -> intReturns the total number of nodes participating in clusters by summing cluster sizes.
get_connectivities() -> List[str]Returns the list of connectivities associated with the frame.
set_connectivities(connectivities: List[str]) -> NoneSets the frame’s connectivities.
add_cluster(cluster: Cluster) -> NoneAdds a cluster to the frame’s cluster list, initializing the list if necessary.
set_clusters(clusters: List[Cluster]) -> NoneSets the frame’s clusters, assigns frame ID to clusters, and sets their lattice matrix.
get_concentration() -> Dict[str, float]Calculates and returns concentrations for each connectivity type based on cluster sizes relative to total nodes, ensuring all connectivities have a value.
__len__() -> intReturns the number of nodes in the frame.
__str__() -> strReturns a readable string summary of the frame including frame ID, node count, and cluster count.
__repr__() -> strReturns detailed string representation including first node data and lattice matrix.
__del__() -> NoneCleans up references to nodes, clusters, lattice, data, and connectivities.
System
Manages trajectory data and provides frame-level access through a file reader. Wraps a reader with lazy frame iteration. On initialization it configures the reader’s filename from settings and triggers a file scan to index frame offsets. Frames can then be accessed individually or iterated over as a generator.
Initialization
System(reader: BaseReader, settings: Settings)
Constructs a System instance with a file reader and configuration settings.
Parameters
reader(BaseReader): File reader object used to load trajectory data.settings(Settings): Configuration object with parameters like file location and frame ranges.
Attributes
reader(BaseReader): The trajectory file reader.settings(Settings): Settings controlling system and file paths.current_frame(Optional[Frame]): Currently loaded frame orNoneif none loaded._current_frame_index(Optional[int]): Internal index tracking current frame position._num_frames(Optional[int]): Cached total number of frames in the trajectory.
Methods
load_frame(frame_index: int) -> boolLoads a specific frame by index.
Validates index against settings range.
Uses reader’s
parsemethod to load the frame.Updates
current_frameand current index if successful.Returns
Trueif frame loaded;Falseotherwise.
get_frame(frame_index: int) -> Optional[Frame]Retrieves a frame by index, loading if necessary.
Returns the
Frameobject orNoneif loading fails.
get_num_frames() -> intReturns total number of frames in trajectory.
Uses cached value if available.
Falls back to counting frames by iteration if needed.
Returns
0on error or if no frames detected.
iter_frames() -> Generator[Frame, None, None]Generator yielding frames within settings-defined frame range.
Uses reader’s frame indices if available.
Iterates frame-by-frame loading each one on demand.
Handles loading exceptions gracefully by continuing.
__iter__() -> SystemMakes
Systemiterable, resetting internal frame index to start.
__next__() -> FrameReturns next frame in iteration sequence respecting frame range.
Raises
StopIterationwhen range or data is exhausted.Loads each frame on-demand using
load_frame().
analysis Module
StrategyFactory
Factory class responsible for creating, registering, and retrieving clustering strategy instances. Implements the factory design pattern to manage a collection of clustering strategies. Selects the appropriate one based on the clustering settings flags.
Initialization
StrategyFactory(frame: Frame, settings: Settings)
Creates a StrategyFactory instance by initializing and registering available clustering strategies using the provided frame and settings.
Parameters
frame(Frame): The frame of trajectory data on which clustering strategies operate.settings(Settings): Configuration settings to determine the appropriate strategy.
Attributes
_strategies(dict): Internal mapping of strategy class names to instantiated strategy objects.
Methods
register_strategy(strategy: BaseClusteringStrategy) -> NoneRegisters a clustering strategy instance in the factory.
Adds the strategy to the internal dictionary keyed by its class name.
get_strategy(settings: Settings) -> Optional[BaseClusteringStrategy]Returns the appropriate clustering strategy instance based on clustering configuration flags in
settings.Strategy selection logic:
If the clustering has coordination number or alternating cluster enabled (but not number of shared neighbors), returns the
CoordinationStrategy.If number of shared neighbors is enabled, returns the
SharedStrategy.If coordination features disabled and criterion is
"distance", returns theDistanceStrategy.If coordination features disabled and criterion is
"bond", returns theBondingStrategy.Returns
Noneif no matching strategy is found.
DistanceStrategy
Clustering strategy based on a direct distance criterion. Connects any two nodes of specified types that are within a cutoff distance of each other. Requires a 2-element connectivity specification (e.g., ["Si", "Si"]).
Inheritance
Inherits from BaseClusteringStrategy.
Initialization
DistanceStrategy(frame: Frame, settings: Settings)
Creates a distance-based clustering strategy for the given frame.
Parameters
frame(Frame): The trajectory frame containing nodes to be clustered.settings(Settings): Configuration settings specifying clustering parameters, cutoffs, and node types.
Attributes
frame(Frame): The frame being processed.clusters(List[Cluster]): List of identified clusters._lattice(np.ndarray): Lattice matrix from the frame._nodes(List[Node]): List of nodes in the frame._settings(Settings): Configuration settings._counter(int): Counter tracking number of clusters formed._neighbor_searcher(NeighborSearcher): Helper object for finding neighbors based on distance.
Methods
find_neighbors() -> NoneExecutes neighbor searching algorithm using the internal neighbor searcher.
Identifies all nodes within cutoff distances of each other.
get_connectivities() -> List[str]Returns list of connectivity descriptors for the clustering.
Validates that connectivity configuration is a two-element list.
Returns formatted connectivity string (e.g.,
["A-B"]).Raises
ValueErrorif connectivity format is invalid.
build_clusters() -> List[Cluster]Constructs clusters by applying union-find algorithm on nodes within distance cutoffs.
Process flow:
Filters networking nodes based on configured node types.
Iterates through nodes, connecting neighbors matching connectivity criteria using union operation.
Groups nodes by their root to form clusters.
Creates
Clusterobjects for groups with more than one node.Calculates cluster properties: unwrapped positions, center of mass, gyration radius, percolation probability, concentration, and order parameter.
Uses progress bars (tqdm) for visual feedback if verbosity enabled.
Returns list of formed clusters with computed properties.
Usage Example
from nexus import SettingsBuilder, main
import nexus.config.settings as c
general_settings = c.GeneralSettings(
...
)
lattice_settings = c.LatticeSettings(
...
)
analysis_settings = c.AnalysisSettings(
...
)
clustering_settings = c.ClusteringSettings(
criterion="distance",
node_types=["A", "B"],
connectivity=["A", "B"],
cutoffs=[c.Cutoff("A", "B", 3.5)]
with_connectivity_name='A-B',
)
settings = (
SettingsBuilder()
.with_general(config_general) # General settings \
.with_lattice(config_lattice) # Lattice settings \
.with_clustering(config_clustering) # Clustering settings \
.with_analysis(config_analysis) # Analysis settings \
.build() # Don't forget to build the settings object
)
main(settings)
BondingStrategy
Clustering strategy that connects networking nodes via a bridging node. Uses a 3-element connectivity pattern (e.g., ["Si", "O", "Si"]) to form clusters by linking networking nodes that share a common bridging node of the specified type.
Inheritance
Inherits from BaseClusteringStrategy.
Initialization
BondingStrategy(frame: Frame, settings: Settings)
Creates a bond-based clustering strategy for the given frame.
Parameters
frame(Frame): The trajectory frame containing nodes to be clustered.settings(Settings): Configuration settings specifying clustering parameters, cutoffs, and node types.
Attributes
frame(Frame): The frame being processed.clusters(List[Cluster]): List of identified clusters._lattice(np.ndarray): Lattice matrix from the frame._nodes(List[Node]): List of nodes in the frame._settings(Settings): Configuration settings._counter(int): Counter tracking number of clusters formed._neighbor_searcher(NeighborSearcher): Helper object for finding neighbors based on distance.
Methods
find_neighbors() -> NoneExecutes neighbor searching algorithm using the internal neighbor searcher.
Identifies all nodes within cutoff distances of each other.
get_connectivities() -> List[str]Returns list of connectivity descriptors for the clustering.
Validates that connectivity configuration is a three-element list.
Returns formatted connectivity string (e.g.,
["Si-O-Si"]).Raises
ValueErrorif connectivity format is invalid.
build_clusters() -> List[Cluster]Constructs clusters by identifying nodes connected through bridging atoms.
Process flow:
Filters networking nodes based on configured node types, excluding bridge atoms.
Iterates through nodes of the first type, examining their neighbors for bridge atoms.
For each bridge atom, checks its neighbors for nodes of the third type.
Connects matching node pairs using union operation.
Groups nodes by their root to form clusters.
Creates
Clusterobjects for groups with more than one node.Calculates cluster properties: unwrapped positions, center of mass, gyration radius, percolation probability, concentration, and order parameter.
Uses progress bars (tqdm) for visual feedback if verbosity enabled.
Returns list of formed clusters with computed properties.
Usage Example
from nexus import SettingsBuilder, main
import nexus.config.settings as c
general_settings = c.GeneralSettings(
...
)
lattice_settings = c.LatticeSettings(
...
)
analysis_settings = c.AnalysisSettings(
...
)
clustering_settings = c.ClusteringSettings(
criterion="bond",
node_types=["A", "B", "C"],
connectivity=["A", "B", "C"],
cutoffs=[
c.Cutoff("A", "B", 2.0),
c.Cutoff("B", "C", 2.0),
c.Cutoff("C", "A", 2.0)
]
)
settings = (
SettingsBuilder()
.with_general(general_settings)
.with_lattice(lattice_settings)
.with_clustering(clustering_settings)
.with_analysis(analysis_settings)
.build()
)
main(settings)
CoordinationStrategy
Clustering strategy that groups nodes by coordination number. Extends the bonding pattern by constraining the coordination number of networking nodes. Supports several pairing modes: pairwise, mixing, alternating, and default.
Inheritance
Inherits from BaseClusteringStrategy.
Initialization
CoordinationStrategy(frame: Frame, settings: Settings)
Creates a coordination-based clustering strategy for the given frame.
Parameters
frame(Frame): The trajectory frame containing nodes to be clustered.settings(Settings): Configuration settings specifying clustering parameters, coordination ranges, and node types.
Attributes
frame(Frame): The frame being processed.clusters(List[Cluster]): List of identified clusters._lattice(np.ndarray): Lattice matrix from the frame._nodes(List[Node]): List of nodes in the frame._settings(Settings): Configuration settings._counter(int): Counter tracking number of clusters formed._search_mode(str): Current search mode ("default","pairwise","mixing", or"alternating")._neighbor_searcher(NeighborSearcher): Helper object for finding neighbors based on distance.
Methods
find_neighbors() -> NoneExecutes neighbor searching algorithm using the internal neighbor searcher.
Calculates coordination number for all nodes after neighbor identification.
calculate_coordination(idx: int) -> NoneCalculates and sets the coordination number for a node at the specified index.
Coordination modes:
"all_types": Counts all neighboring nodes."same_type": Counts only neighbors with the same element symbol."different_type": Counts only neighbors with different element symbols."<node_type>": Counts only neighbors matching specified node type.
find(node: Node) -> NodeFinds the root node in the union-find structure with path compression.
Returns the root parent node.
union(node_1: Node, node_2: Node) -> NoneMerges two nodes into the same cluster by connecting their roots.
get_connectivities() -> List[str]Generates connectivity descriptors based on clustering criterion and coordination settings.
Supports multiple coordination modes:
Pairwise: Nodes with identical coordination numbers (e.g.,
["Si4-Si4", "Si5-Si5"]).Mixing: All combinations of coordination numbers within range (e.g.,
["Si4-Si4", "Si4-Si5", "Si5-Si5"]).Alternating: Consecutive coordination pairs (e.g.,
["Si4-Si4", "Si4-Si5", "Si5-Si5", "Si5-Si6"]).Default: Single user-specified connectivity name.
Returns list of formatted connectivity strings.
build_clusters() -> List[Cluster]Constructs clusters by grouping nodes with matching coordination criteria.
Process flow:
Filters networking nodes based on node types and criterion.
Generates connectivities based on coordination number range and mode.
For each connectivity, finds and connects matching node pairs using union-find algorithm.
Groups nodes by their root to form clusters.
Creates
Clusterobjects for groups with more than one node.Calculates cluster properties: unwrapped positions, center of mass, gyration radius, percolation probability, concentration, and order parameter.
Uses progress bars (tqdm) for visual feedback if verbosity enabled.
Returns list of formed clusters with computed properties.
_find_cluster(networking_nodes: List[Node], connectivity: str, z1: int, z2: int) -> NoneInternal helper method to find clusters for a specific connectivity pattern.
Parameters:
networking_nodes: List of nodes to cluster.connectivity: Connectivity descriptor string.z1: First coordination number constraint.z2: Second coordination number constraint.
Applies union operations on node pairs matching coordination criteria.
Creates and stores resulting clusters.
Usage Examples
The following examples use the CoordinationStrategy to cluster SiOz polyhedra in a SiO2 system with various approaches.
Example 1
The first example clusters SiOz polyhedra with a pairwise coordination number constraint in the interval [4, 6].
It will find the following connectivities: SiO4-SiO4, SiO5-SiO5, SiO6-SiO6.
Note that if the criterion is bond, the connectivity must be specified with 3 elements, the first and last being the networking nodes while the one in the middle being the ‘bridging’ node, e.g., [“Si”, “O”, “Si”] for SiOz-SiOz with z the coordination number.
In the case of distance, the connectivity must be specified with 2 elements, the first and last being the networking nodes, e.g., [“Si”, “Si”] for Siz-Siz with z the coordination number.
from nexus import SettingsBuilder, main
import nexus.config.settings as c
general_settings = c.GeneralSettings(
...
)
lattice_settings = c.LatticeSettings(
...
)
analysis_settings = c.AnalysisSettings(
...
)
clustering_settings = c.ClusteringSettings(
criterion="bond",
node_types=["Si", "O"],
connectivity=["Si", "O", "Si"],
cutoffs=[
c.Cutoff("Si", "O", 2.3),
c.Cutoff("Si", "Si", 3.5),
c.Cutoff("O", "O", 3.05),
],
with_coordination_number=True,
coordination_mode="different_type",
coordination_range=[4, 6],
with_pairwise=True
)
# OR
clustering_settings = c.ClusteringSettings(
criterion="distance",
node_types=["Si", "Si"],
connectivity=["Si", "Si"],
cutoffs=[
c.Cutoff("Si", "Si", 3.5),
],
with_coordination_number=True,
coordination_mode="same_type", # "all_types" or "same_type" or "different_type" or "<node_type>"
coordination_range=[4, 6],
with_pairwise=True
)
settings = (
SettingsBuilder()
.with_general(general_settings)
.with_lattice(lattice_settings)
.with_clustering(clustering_settings)
.with_analysis(analysis_settings)
.build()
)
main(settings)
Example 2
The second example clusters SiOz polyhedra with a mixing coordination number constraint in the interval [4, 6].
It will find the following connectivities: SiO4-SiO4, SiO4-SiO5, SiO5-SiO5, SiO4-SiO6, SiO5-SiO6, SiO6-SiO6.
...
clustering_settings = c.ClusteringSettings(
criterion="bond",
node_types=["Si", "O"],
connectivity=["Si", "O", "Si"],
cutoffs=[
c.Cutoff("Si", "O", 2.3),
c.Cutoff("Si", "Si", 3.5),
c.Cutoff("O", "O", 3.05),
],
with_coordination_number=True,
coordination_mode="different_type",
coordination_range=[4, 6],
with_pairwise=False,
with_mixing=True
)
...
Example 3
The third example clusters SiOz polyhedra with an alternating coordination number constraint in the interval [4, 6].
It will find the following connectivities: SiO4-SiO4, SiO4-SiO5, SiO5-SiO5, SiO4-SiO6, SiO5-SiO6, SiO6-SiO6.
...
clustering_settings = c.ClusteringSettings(
criterion="bond",
node_types=["Si", "O"],
connectivity=["Si", "O", "Si"],
cutoffs=[
c.Cutoff("Si", "O", 2.3),
c.Cutoff("Si", "Si", 3.5),
c.Cutoff("O", "O", 3.05),
],
with_coordination_number=True,
coordination_mode="different_type",
coordination_range=[4, 6],
with_pairwise=False,
with_mixing=False,
with_alternating=True
)
...
Example 4
The fourth example clusters all SiOz polyhedra with a coordination number constraint in the interval [5, 7].
In this settings, a custom name to the connectivity can be given, e.g. HD.
...
clustering_settings = c.ClusteringSettings(
criterion="bond",
node_types=["Si", "O"],
connectivity=["Si", "O", "Si"],
cutoffs=[
c.Cutoff("Si", "O", 2.3),
c.Cutoff("Si", "Si", 3.5),
c.Cutoff("O", "O", 3.05),
],
with_coordination_number=True,
coordination_mode="different_type",
coordination_range=[5, 7],
with_pairwise=False,
with_mixing=False,
with_alternating=False,
with_number_of_shared=False,
with_connectivity_name="HD"
)
...
Usage Example
from nexus import SettingsBuilder, main
import nexus.config.settings as c
general_settings = c.GeneralSettings(
...
)
lattice_settings = c.LatticeSettings(
...
)
analysis_settings = c.AnalysisSettings(
...
)
clustering_settings = c.ClusteringSettings(
criterion="bond",
node_types=["Si", "O"],
connectivity=["Si", "O", "Si"],
cutoffs=[
c.Cutoff("Si", "O", 2.3),
c.Cutoff("O", "O", 3.05),
c.Cutoff("Si", "Si", 3.5)
],
with_coordination_number=True,
coordination_mode="different_type",
coordination_range=[6, 6],
with_number_of_shared=True,
shared_mode="different_type",
shared_threshold=2,
with_pairwise=False,
with_mixing=False,
with_alternating=False,
with_connectivity_name="Stishovite"
)
settings = (
SettingsBuilder()
.with_general(general_settings)
.with_lattice(lattice_settings)
.with_clustering(clustering_settings)
.with_analysis(analysis_settings)
.build()
)
main(settings)
AnalyzerFactory
Factory class responsible for creating, registering, and retrieving analyzer instances. Implements the factory design pattern to manage a collection of analyzer objects.
Initialization
AnalyzerFactory(settings: Settings, verbose: bool = True)
Creates an AnalyzerFactory instance by initializing and registering all available analyzer types.
Parameters
settings(Settings): Configuration settings used to initialize analyzers with appropriate parameters.verbose(bool): Flag controlling verbosity of output. Default:True.
Attributes
_analyzers(dict): Internal mapping of analyzer class names to instantiated analyzer objects.
Methods
register_analyzer(analyzer: BaseAnalyzer) -> NoneRegisters an analyzer instance in the factory.
Adds the analyzer to the internal dictionary keyed by its class name.
get_analyzer(analyzer_name: str) -> Optional[BaseAnalyzer]Returns the analyzer instance corresponding to the specified name.
Returns
Noneif no analyzer with the given name is registered.Available analyzer names:
"AverageClusterSizeAnalyzer": Calculates average cluster sizes."ConcentrationAnalyzer": Computes cluster concentrations."LargestClusterSizeAnalyzer": Identifies largest cluster sizes."SpanningClusterSizeAnalyzer": Measures spanning cluster properties."PercolationProbabilityAnalyzer": Determines percolation probabilities."OrderParameterAnalyzer": Calculates order parameters."ClusterSizeDistributionAnalyzer": Generates cluster size distributions."GyrationRadiusAnalyzer": Computes radii of gyration."CorrelationLengthAnalyzer": Calculates correlation lengths.
AverageClusterSizeAnalyzer
Computes the weight-average cluster size <S>. Uses the formula <S> = sum(s^2 * n(s)) / sum(s * n(s)) where s is the cluster size and n(s) the number of clusters of size s. Percolating clusters are excluded to focus on the finite-cluster distribution.
Inheritance
Inherits from BaseAnalyzer.
Initialization
AverageClusterSizeAnalyzer(settings: Settings)
Creates an analyzer for computing weight-average cluster sizes across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_average_sizes(Dict[str, List[float]]): Raw per-frame average cluster sizes for each connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.average_sizes(Dict[str, float]): Final mean average cluster size for each connectivity type.std(Dict[str, float]): Standard deviation of average cluster sizes across frames.error(Dict[str, float]): Standard error of average cluster sizes across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to compute the weight-average cluster size for each connectivity type.
Calculation Methodology:
For each connectivity, extracts all non-percolating clusters.
Computes the weight-average size using the formula: $$ \langle S \rangle = \frac{\sum_s s^2 n_s}{\sum_s s n_s} $$ where $s$ is the cluster size and $n_s$ is the number of clusters of size $s$.
Stores raw values for later aggregation across all frames.
Percolation Theory Significance:
The weight-average cluster size is a second moment of the cluster size distribution.
This quantity is proportional to the percolation susceptibility $\chi$, which measures the system’s response to changes in connectivity.
Near the percolation threshold $p_c$, $\langle S \rangle$ diverges as a power law: $\langle S \rangle \sim |p - p_c|^{-\gamma}$, where $\gamma$ is a critical exponent.
Percolating clusters are excluded because they represent infinite clusters in the thermodynamic limit and would dominate the calculation.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean average cluster size across all frames.
Computes standard deviation and standard error using Bessel’s correction (ddof=1).
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, average cluster sizes, standard deviations, and errors for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
average_cluster_size.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Average_cluster_size,Standard_deviation,Standard_errorAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
ClusterSizeDistributionAnalyzer
Computes the cluster size distribution n(s) for each connectivity type. Counts how many clusters of each size s exist per connectivity across all frames. Percolating clusters are excluded from the distribution. Results are written to one file per connectivity.
Inheritance
Inherits from BaseAnalyzer.
Initialization
ClusterSizeDistributionAnalyzer(settings: Settings)
Creates an analyzer for computing cluster size distributions across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_size_distribution(Dict[str, Dict[int, List[int]]]): Raw per-frame counts of clusters for each size and connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.size_distribution(Dict[str, Dict[int, float]]): Final total number of clusters count per size for each connectivity type.std(Dict[str, Dict[int, float]]): Standard deviation of cluster counts for each size across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to compute the cluster size distribution for each connectivity type.
Calculation Methodology:
For each connectivity, extracts all non-percolating clusters.
Counts the occurrence of each cluster size $s$ to obtain $n_s$.
Stores raw counts for later averaging across all frames.
Percolation Theory Significance:
The cluster size distribution $n_s$ describes the total number of clusters of size $s$ accross each frames.
Near the percolation threshold $p_c$, this distribution exhibits power-law scaling: $n_s \sim s^{-\tau}$, where $\tau$ is the Fisher exponent (approximately 2.18 for 3D systems).
Percolating clusters are excluded to focus on finite cluster statistics, as the infinite percolating cluster would dominate near and above the threshold.
finalize() -> DictComputes final statistics by averaging cluster counts over all processed frames.
Calculation Details:
For each cluster size, calculates the total number of clusters in each frame.
Computes standard deviation accounting for frames where specific sizes may not appear (zero counts included).
Uses Bessel’s correction (ddof=1) for unbiased standard deviation estimation.
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, size distributions, and standard deviations.
get_result() -> Dict[str, Dict]Returns the finalized analysis results without recalculation.
Provides access to concentrations, cluster size distributions, and standard deviations for all connectivity types.
print_to_file() -> NoneWrites finalized results to separate CSV-formatted data files for each connectivity type.
Output files:
cluster_size_distribution-{connectivity}.datin the export directory.Includes header with metadata: connectivity type, date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Cluster_size,N_clusters_per_frame,Standard_deviationData sorted by cluster size in descending order for convenient plotting.
Automatically removes duplicate lines to prevent data redundancy.
_write_header(connectivity: str) -> NoneInternal method to write the file header with analysis metadata for a specific connectivity type.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
ConcentrationAnalyzer
Computes the node concentration for each connectivity type. The concentration is defined as the ratio of the number of nodes participating in clusters of a given connectivity to the total number of networking nodes.
Inheritance
Inherits from BaseAnalyzer.
Initialization
ConcentrationAnalyzer(settings: Settings)
Creates an analyzer for computing cluster concentrations across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity type.concentrations(Dict[str, float]): Final mean concentration for each connectivity type.std(Dict[str, float]): Standard deviation of concentrations across frames.error(Dict[str, float]): Standard error of concentrations across frames._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to compute the concentration for each connectivity type.
Calculation Methodology:
Extracts concentration values from the frame for each connectivity type.
Concentration is calculated as the ratio of nodes in clusters to total nodes in the system.
Stores raw values for later aggregation across all frames.
Percolation Theory Significance:
Concentration represents the fraction of the system participating in the cluster network.
Above the percolation threshold, the concentration of the spanning cluster corresponds to the order parameter $P_\infty$, which characterizes the phase transition.
Below threshold, concentration measures the extent of finite cluster formation.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean concentration across all frames.
Computes standard deviation and error using Bessel’s correction (ddof=1).
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, standard deviations, and error.
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, standard deviations, and error for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
concentrations.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Standard_deviation,Standard_errorAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
CorrelationLengthAnalyzer
Computes the correlation length (xi) of the cluster size distribution. Defined as xi^2 = sum(2 * R_s^2 * s^2 * n_s) / sum(s^2 * n_s) where R_s is the gyration radius and s the cluster size. Only non-percolating clusters contribute.
Inheritance
Inherits from BaseAnalyzer.
Initialization
CorrelationLengthAnalyzer(settings: Settings)
Creates an analyzer for computing correlation lengths across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_correlation_lengths(Dict[str, List[float]]): Raw per-frame correlation length values for each connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.correlation_length(Dict[str, float]): Final mean correlation length for each connectivity type.std(Dict[str, float]): Standard deviation of correlation lengths across frames.error(Dict[str, float]): Standard error of correlation lengths across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to compute the correlation length for each connectivity type.
Calculation Methodology:
For each connectivity, extracts all non-percolating clusters.
Computes the correlation length using the weighted second moment of gyration radii: $$ \xi^2 = \frac{\sum_s 2 R_{g,s}^2 s^2 n_s}{\sum_s s^2 n_s} $$ where $R_{g,s}$ is the radius of gyration for clusters of size $s$, and $n_s$ is the number of such clusters.
Stores raw values for later aggregation across all frames.
Percolation Theory Significance:
The correlation length defines the characteristic size scale over which structural correlations persist in the system.
Near the percolation threshold $p_c$, $\xi$ diverges as a power law: $\xi \sim |p - p_c|^{-\nu}$, where $\nu$ is the correlation length critical exponent (approximately 0.88 for standard percolation in 3D systems).
Below threshold, $\xi$ represents the typical size of finite clusters and sets the cutoff for power-law scaling in the cluster size distribution.
At the percolation threshold, the divergence of $\xi$ reflects the emergence of correlations at all length scales, a hallmark of critical phenomena.
Percolating clusters are excluded as they represent infinite clusters whose spatial extent would dominate the calculation.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean correlation length across all frames.
Computes standard deviation and error using Bessel’s correction (ddof=1).
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, correlation lengths, standard deviations, and errors.
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, correlation lengths, standard deviations, and errors for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
correlation_length.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Correlation_length,Standard_deviation,Standard_errorAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
GyrationRadiusAnalyzer
Computes the mean gyration radius binned by cluster size for each connectivity. Collects the gyration radius of all non-percolating clusters, groups them by cluster size, and averages over all processed frames. Results are written to one file per connectivity.
Inheritance
Inherits from BaseAnalyzer.
Initialization
GyrationRadiusAnalyzer(settings: Settings)
Creates an analyzer for computing gyration radius distributions across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_gyration_radii(Dict[str, Dict[int, List[float]]]): Raw per-frame gyration radii for each cluster size and connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.gyration_radii(Dict[str, Dict[int, float]]): Final averaged gyration radius for each cluster size and connectivity type.std(Dict[str, Dict[int, float]]): Standard deviation of gyration radii for each size across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneCollects gyration radii of non-percolating clusters for each connectivity type, grouped by cluster size.
Calculation Methodology:
For each connectivity, extracts all non-percolating clusters.
Records the gyration radius $R_g$ for each cluster, binned by its size $s$.
The gyration radius is computed as: $$ R_g^2 = \frac{1}{N} \sum_{i=1}^{N} |\mathbf{r}i - \mathbf{r}{\text{COM}}|^2 $$ where $N$ is the number of nodes in the cluster, $\mathbf{r}i$ are node positions, and $\mathbf{r}{\text{COM}}$ is the center of mass.
Stores raw values for later averaging across all frames.
Percolation Theory Significance:
The gyration radius quantifies the spatial extent of clusters and their degree of compactness.
Near the percolation threshold, clusters exhibit fractal geometry with scaling relation: $R_g \sim s^{1/d_f}$, where $d_f$ is the fractal dimension (approximately 2.53 for standard percolation in 3D systems).
For compact clusters, $d_f = d$ (the spatial dimension), while fractal clusters have $d_f < d$.
The scaling behavior of $R_g$ versus $s$ reveals whether clusters are ramified (fractal) or compact, providing insight into the mechanisms of cluster growth.
At criticality, clusters are self-similar fractals with characteristic power-law scaling that persists across multiple length scales.
Percolating clusters are excluded to focus on finite cluster geometry.
finalize() -> Dict[str, Dict]Computes average gyration radius and standard deviation for each cluster size across all processed frames.
Calculation Details:
For each cluster size, calculates the mean gyration radius across all observations.
Computes standard deviation using Bessel’s correction (ddof=1) when multiple observations exist.
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, gyration radii, and standard deviations.
get_result() -> Dict[str, Dict]Returns the finalized analysis results without recalculation.
Provides access to concentrations, gyration radii distributions, and standard deviations for all connectivity types.
print_to_file() -> NoneWrites finalized results to separate CSV-formatted data files for each connectivity type.
Output files:
gyration_radius_distribution-{connectivity}.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Cluster_size,Gyration_radius,Standard_deviationData sorted by cluster size in descending order for convenient plotting.
Automatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata for each connectivity type.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
LargestClusterSizeAnalyzer
Computes the size of the largest cluster for each connectivity type. Tracks the maximum cluster size per connectivity across frames and provides ensemble-averaged results with standard deviation and error.
Inheritance
Inherits from BaseAnalyzer.
Initialization
LargestClusterSizeAnalyzer(settings: Settings)
Creates an analyzer for computing largest cluster sizes across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_sizes(Dict[str, List[float]]): Raw per-frame largest cluster sizes for each connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.largest_cluster_sizes(Dict[str, float]): Final mean largest cluster size for each connectivity type.std(Dict[str, float]): Standard deviation of largest cluster sizes across frames.error(Dict[str, float]): Standard error of largest cluster sizes across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to identify the largest cluster size for each connectivity type.
Calculation Methodology:
For each connectivity, examines all clusters (including percolating clusters).
Identifies the maximum cluster size: $S_{\text{max}} = \max{s_1, s_2, \ldots, s_n}$.
Stores raw values for later aggregation across all frames.
Percolation Theory Significance:
The largest cluster size is intimately connected to the percolation order parameter $P_\infty$.
Below the percolation threshold ($p < p_c$), the largest cluster is finite and grows slowly with system size.
At the threshold ($p = p_c$), the largest cluster exhibits critical scaling: $S_{\text{max}} \sim L^{d\beta/\nu}$, where $L$ is the system size, $d$ is dimensionality, and $\beta, \nu$ are critical exponents.
Above threshold ($p > p_c$), the largest cluster becomes the percolating cluster, scaling with system volume.
The largest cluster captures the dominant connectivity pathway and is the primary contributor to transport properties in percolating systems.
Unlike other metrics, this analyzer includes percolating clusters, as they represent the dominant structure above threshold.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean largest cluster size across all frames.
Computes standard deviation and error using Bessel’s correction (ddof=1).
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, largest cluster sizes, standard deviations, and errors.
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, largest cluster sizes, standard deviations, and errors for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
largest_cluster_size.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Largest_cluster_size,Standard_deviation,Standard_errorAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
OrderParameterAnalyzer
Computes the percolation order parameter (P_inf) for each connectivity type. The order parameter is the fraction of networking nodes that belong to the percolating cluster.
Inheritance
Inherits from BaseAnalyzer.
Initialization
OrderParameterAnalyzer(settings: Settings)
Creates an analyzer for computing order parameters across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_order_parameters(Dict[str, List[float]]): Raw per-frame order parameter values for each connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.order_parameters(Dict[str, float]): Final mean order parameter for each connectivity type.std(Dict[str, float]): Standard deviation of order parameters across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to compute the order parameter for each connectivity type.
Calculation Methodology:
For each connectivity, identifies all percolating clusters.
Selects the largest percolating cluster if multiple exist.
Extracts the first component of the order parameter vector (directional order parameter).
If no percolating cluster exists, assigns order parameter of zero.
Stores raw values for later aggregation across all frames.
Percolation Theory Significance:
The order parameter $P_\infty$ represents the probability that a randomly selected node belongs to the infinite percolating cluster.
Below the percolation threshold ($p < p_c$), no infinite cluster exists and $P_\infty = 0$.
Above threshold ($p > p_c$), the order parameter emerges continuously: $P_\infty \sim (p - p_c)^\beta$, where $\beta$ is the order parameter critical exponent (approximately 0.41 for standard percolation in 3D systems).
This metric is analogous to spontaneous magnetization in the Ising model and serves as the primary signature of the percolation phase transition.
The order parameter distinguishes the non-percolating phase (isolated finite clusters) from the percolating phase (system-spanning connectivity).
The directional component reflects percolation along a specific spatial direction, relevant for anisotropic systems or directional transport properties.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean order parameter across all frames.
Computes standard deviation using Bessel’s correction (ddof=1).
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, order parameters, and standard deviations.
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, order parameters, and standard deviations for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
order_parameter.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Order_parameter,Standard_deviationAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
PercolationProbabilityAnalyzer
Computes the percolation probability (Pi) for each connectivity type. The percolation probability is the fraction of frames in which at least one cluster spans the simulation box in all three dimensions. A value of 1.0 indicates percolation in every frame; 0.0 indicates it never occurs.
Inheritance
Inherits from BaseAnalyzer.
Initialization
PercolationProbabilityAnalyzer(settings: Settings)
Creates an analyzer for computing percolation probabilities across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_percolation_prob_x(Dict[str, List[float]]): Raw per-frame binary indicators (1.0 or 0.0) of percolation occurrence for each connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.percolation_probabilities(Dict[str, float]): Final mean percolation probability for each connectivity type.std(Dict[str, float]): Standard deviation of percolation probabilities across frames.concentrations(Dict[str, float]): Mea concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneCheck whether percolation occurs for each connectivity in the frame.
Calculation Methodology:
For each connectivity, examines all clusters to detect percolation.
A cluster is considered percolating if it spans the simulation box in all three dimensions (percolation_probability contains “xyz”).
Records binary outcome: 1.0 if percolation occurs, 0.0 otherwise.
Stores raw values for later averaging across all frames.
Percolation Theory Significance:
The percolation probability $\Pi(p)$ represents the ensemble probability of observing system-spanning connectivity at a given occupation probability $p$.
At the percolation threshold $p_c$, $\Pi$ undergoes a sharp transition from 0 to 1 in the thermodynamic limit.
For finite systems, the transition is smoothed over a narrow range around $p_c$, with width scaling as $L^{-1/\nu}$ where $L$ is system size and $\nu$ is the correlation length exponent.
The inflection point of $\Pi(p)$ provides an accurate estimator of the percolation threshold.
Unlike the order parameter $P_\infty$ (which measures the fraction of nodes in the percolating cluster), $\Pi$ is a Boolean property indicating the mere existence of percolation.
Directional percolation probability is particularly relevant for anisotropic systems and transport phenomena where connectivity along specific axes determines material properties.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean percolation probability: $\Pi = \frac{1}{N_{\text{frames}}} \sum_{i=1}^{N_{\text{frames}}} \delta_i$, where $\delta_i = 1$ if frame $i$ percolates and 0 otherwise.
Computes standard deviation using Bessel’s correction (ddof=1).
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, percolation probabilities, and standard deviations.
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, percolation probabilities, and standard deviations for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
percolation_probability.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Percolation_probability,Standard_deviationAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
Analysis checks percolation in all three dimensions.
SpanningClusterSizeAnalyzer
Computes the size of the largest finite (non-percolating) cluster. Tracks the maximum non-percolating cluster size per connectivity across frames.
Inheritance
Inherits from BaseAnalyzer.
Initialization
SpanningClusterSizeAnalyzer(settings: Settings)
Creates an analyzer for computing spanning cluster sizes across trajectory frames.
Parameters
settings(Settings): Configuration settings specifying analysis parameters and output paths.
Attributes
_raw_spanning_sizes(Dict[str, List[float]]): Raw per-frame largest non-percolating cluster sizes for each connectivity type._raw_concentrations(Dict[str, List[float]]): Raw per-frame concentration values for each connectivity.spanning_cluster_sizes(Dict[str, float]): Final mean spanning cluster size for each connectivity type.std(Dict[str, float]): Standard deviation of spanning cluster sizes across frames.error(Dict[str, float]): Standard error of spanning cluster sizes across frames.concentrations(Dict[str, float]): Mean concentration for each connectivity type._finalized(bool): Flag indicating whether final calculations have been performed.
Methods
analyze(frame: Frame, connectivities: List[str]) -> NoneAnalyzes a single frame to identify the largest non-percolating cluster size for each connectivity type.
Calculation Methodology:
For each connectivity, extracts all non-percolating (finite) clusters.
Identifies the maximum cluster size among finite clusters: $S_{\text{span}} = \max{s_1, s_2, \ldots, s_n}$ where clusters do not span the system.
If no finite clusters exist (all clusters percolate), records size as zero.
Stores raw values for later aggregation across all frames.
Percolation Theory Significance:
In the sub-critical regime ($p < p_c$), the spanning cluster represents the largest structural feature and governs finite-size scaling behavior.
As the system approaches the percolation threshold from below, $S_{\text{span}}$ grows rapidly, following a power law: $S_{\text{span}} \sim |p_c - p|^{-1/\sigma}$, where $\sigma$ is related to critical exponents and the fractal dimension in percolation theory.
This metric is complementary to the largest cluster size analyzer—while that includes percolating clusters, the spanning cluster size focuses exclusively on the finite cluster distribution.
Above the threshold, $S_{\text{span}}$ measures the size of the largest finite cluster coexisting with the percolating network, providing insight into the residual finite cluster population.
The spanning cluster often dominates transport and structural properties in the sub-critical regime before system-wide percolation emerges.
finalize() -> Dict[str, Dict[str, float]]Computes final statistics by averaging over all processed frames.
Calculation Details:
Calculates mean spanning cluster size across all frames.
Computes standard deviation and error using Bessel’s correction (ddof=1).
Idempotent operation: can be called multiple times safely.
Returns dictionary containing concentrations, spanning cluster sizes, standard deviations, and errors.
get_result() -> Dict[str, Dict[str, float]]Returns the finalized analysis results without recalculation.
Provides access to concentrations, spanning cluster sizes, standard deviations, and errors for all connectivity types.
print_to_file() -> NoneWrites finalized results to a CSV-formatted data file.
Output file:
spanning_cluster_size.datin the export directory.Includes header with metadata: date, number of frames analyzed, and column descriptions.
Format:
Connectivity_type,Concentration,Spanning_cluster_size,Standard_deviation,Standard_errorAutomatically removes duplicate lines to prevent data redundancy.
_write_header() -> NoneInternal method to write the file header with analysis metadata.
Respects overwrite settings: creates new file or appends to existing based on configuration.
Includes timestamp and frame count for reproducibility.
Parser module (io module)
Utility class for discovering trajectory files in a directory and parsing associated metadata from an info.csv file. Automatically sorts files and validates metadata consistency.
Initialization
Parser(file_location: str, format: str)
Creates a parser instance that scans for trajectory files and loads metadata.
Parameters
file_location(str): Path to either a trajectory file or directory containing trajectory files.format(str): File extension format to search for (e.g.,"xyz","dat").
Raises
ValueError: If the specifiedfile_locationdoes not exist.ValueError: Ifinfo.csvis empty or has mismatched line count with trajectory files.ValueError: Ifinfo.csvis not found in the directory.
Attributes
file_location(str): Path to the file or directory being parsed.format(str): File extension format to filter.files(List[str]): List of discovered trajectory file paths, sorted alphabetically.infos(Dict[str, List[float]]): Dictionary mapping column names frominfo.csvto their values.
Methods
parse() -> List[str]Discovers and collects all trajectory files matching the specified format.
Behavior:
If
file_locationis a file: Scans the parent directory for matching files.If
file_locationis a directory: Scans that directory for matching files.Sorts files alphabetically for consistent ordering.
Returns: List of full file paths (also stored in
self.files).
parse_infos() -> NoneReads and parses the
info.csvmetadata file from the directory.File Format Requirements:
Must be named
info.csvand located infile_locationdirectory.First line: comma-separated column names (header).
Subsequent lines: comma-separated values, one per trajectory file.
Must contain a
project_namecolumn (stored as strings).All other columns stored as floats.
Number of data lines must match number of trajectory files.
Returns: None (populates
self.infosdictionary).Use Case: Associates metadata like probability, density, temperature, or pressure with each trajectory file for plotting and analysis.
Raises:
ValueError: Ifinfo.csvis empty.ValueError: If line count doesn’t match number of trajectory files.ValueError: Ifinfo.csvis not found.
get_files() -> List[str]Returns the list of discovered trajectory files.
Returns: List of full file paths.
If
fileslist is empty, automatically callsparse()first.
get_infos() -> Dict[str, List[float]]Returns the parsed metadata dictionary.
Returns: Dictionary mapping column names to lists of values.
If
infosdictionary is empty, automatically callsparse_infos()first.
Usage Example
from nexus.io.parser import Parser
# Parse trajectory files and metadata
parser = Parser(
file_location="/path/to/trajectories/",
format="xyz"
)
# Get sorted list of trajectory files
trajectory_files = parser.get_files()
print(f"Found {len(trajectory_files)} trajectory files")
# Get associated metadata
metadata = parser.get_infos()
print(f"Metadata columns: {list(metadata.keys())}")
# Access specific metadata
temperatures = metadata.get("temperature", [])
project_names = metadata.get("project_name", [])
# Iterate through files with metadata
for i, (file, temp, name) in enumerate(zip(trajectory_files, temperatures, project_names)):
print(f"{i}: {name} at {temp}K - {file}")
info.csv Format Example
project_name,temperature,pressure,density
system_001,300.0,1.0,0.85
system_002,350.0,1.0,0.82
system_003,400.0,1.0,0.79
Writer module (io module)
WriterFactory
Factory class for creating and managing file writer instances based on writer type. Implements the factory pattern to provide centralized writer instantiation with shared settings.
Initialization
WriterFactory(settings: Settings)
Creates a writer factory instance and registers all available writer types.
Parameters
settings(Settings): Configuration settings shared across all writer instances.
Attributes
_writers(dict): Internal registry mapping writer class names to writer classes._settings(Settings): Configuration settings used to initialize writers.
Registered Writers
The factory automatically registers the following writer types:
ClustersWriter: Writes cluster configuration and unwrapped position data.
LogsWriter: Writes execution logs and run information.
PerformanceWriter: Writes performance metrics and timing data.
Methods
register_writer(writer: BaseWriter) -> NoneRegisters a new writer class in the factory.
Parameters:
writer(BaseWriter): Writer class (not instance) to register.
Returns: None
Stores the writer class using its class name as the key.
get_writer(name: str, mode: str = "all") -> Optional[BaseWriter]Retrieves an instantiated writer of the specified type.
Parameters:
name(str): Name of the writer class to instantiate. Options:"ClustersWriter": For cluster data output."LogsWriter": For execution logs."PerformanceWriter": For performance metrics.
Returns: Instantiated writer object or
Noneif writer name not recognized.Each call creates a new writer instance with the factory’s settings.
ClustersWriter
Writes cluster data to XYZ and bond files, including unwrapped atomic positions for both networking nodes and decorating atoms (e.g., bridging oxygens). Supports multiple output modes for flexible data organization.
Inheritance
Inherits from BaseWriter.
Initialization
ClustersWriter(settings: Settings)
Creates a cluster file writer with specified configuration settings.
Parameters
settings(Settings): Configuration settings controlling output behavior and file paths.
Attributes
_settings(Settings): Configuration settings._clusters(List[Cluster]): List of clusters to write, sorted by size (largest first).
Methods
set_clusters(clusters: List[Cluster]) -> NoneSets the clusters to be written and sorts them by size in descending order.
Parameters:
clusters(List[Cluster]): List of cluster objects to write.
Returns: None
write() -> NoneWrites cluster data according to the configured print mode.
Print Modes (from
settings.clustering.print_mode):"none": No output written."all": All clusters combined into single XYZ and bonds files."connectivity": Clusters grouped by connectivity type into separate files."individual": Each cluster written to its own file.
Returns: None
_write_all() -> NoneWrites all clusters to a single combined XYZ and bonds file.
Output Files:
unwrapped_clusters/all_unwrapped_clusters-frame_{frame_id}.xyzunwrapped_clusters/all_unwrapped_clusters-frame_{frame_id}.bonds
Behavior:
Counts total unique atoms (networking + decorating nodes).
Builds global node ID to local file index mapping.
Writes all cluster atoms sequentially.
Writes bond information using local indices.
_write_connectivity() -> NoneWrites clusters grouped by connectivity type to separate files per connectivity.
Output Files:
unwrapped_clusters/{connectivity}/{connectivity}_unwrapped_clusters-frame_{frame_id}.xyzunwrapped_clusters/{connectivity}/{connectivity}_unwrapped_clusters-frame_{frame_id}.bonds
Behavior:
Groups clusters by connectivity descriptor.
Marks first (largest) cluster in each group as spanning.
Creates subdirectories for each connectivity type.
_write_individual() -> NoneWrites each cluster to its own separate XYZ and bonds files.
Output Files:
unwrapped_clusters/{connectivity}/cluster-frame_{frame_id}-id_{root_id}.xyzunwrapped_clusters/{connectivity}/cluster-frame_{frame_id}-id_{root_id}.bonds
Behavior:
Creates separate files for every cluster.
Organizes files by connectivity type in subdirectories.
_write_header_comment(f: TextIO, cluster: Cluster) -> NoneWrites the extended XYZ format header comment line with lattice and properties.
Parameters:
f(TextIO): Open file handle.cluster(Cluster): Cluster object containing lattice information.
Header Format:
Lattice="{lxx} {lxy} {lxz} {lyx} {lyy} {lyz} {lzx} {lzy} {lzz}" Properties=species:S:1:index:I:1:pos:R:3:cluster_id:I:1:coordination:I:1:percolating:I:1:spanning:I:1
Properties:
species: Element symbol (string)index: Global node ID (integer)pos: Unwrapped position (3 real numbers)cluster_id: Root cluster ID (integer)coordination: Coordination number (integer)percolating: Percolation dimensionality (integer)spanning: Spanning flag (integer, 1 or 0)
_write_cluster_atoms(f: TextIO, cluster: Cluster, start_index: int, unique_ids: List|None = None) -> Dict[int, int]Writes all atoms (networking and decorating nodes) for a cluster to file.
Parameters:
f(TextIO): Open file handle.cluster(Cluster): Cluster object to write.start_index(int): Starting local index for this cluster’s atoms.unique_ids(List|None): Set of already-written node IDs to prevent duplicates. Default:None.
Returns: Dictionary mapping global node IDs to local file indices, and updated unique IDs set.
Atom Line Format:
{symbol} {global_id} {x} {y} {z} {cluster_id} {coordination} {percolation_dims} {spanning}
Behavior:
Writes primary networking nodes first.
Writes decorating atoms (e.g., bridging oxygens) second.
Tracks unique IDs to avoid writing shared decorating atoms multiple times.
Builds mapping from global node IDs to sequential local indices.
_write_cluster_bonds(f: TextIO, cluster: Cluster, id_map: Dict[int, int]) -> NoneWrites bond connectivity information using local file indices.
Parameters:
f(TextIO): Open file handle for bonds file.cluster(Cluster): Cluster object containing linkage information.id_map(Dict[int, int]): Mapping from global node IDs to local file indices.
Returns: None
Bond Line Format:
{symbol1}({local_index1})-{symbol2}({local_index2})
Behavior:
Skips bond writing if clustering criterion is
"distance".Only writes bonds between networking nodes (uses cluster linkages).
Translates global node IDs to local file indices for visualization compatibility.
Output File Formats
XYZ Format:
Line 1: Total atom count
Line 2: Extended XYZ header with lattice and properties
Remaining lines: Atom data (symbol, ID, position, properties)
Bonds Format:
One bond per line in format:
Symbol1(index1)-Symbol2(index2)Uses local file indices (1-based) for visualization software compatibility
Usage Example
config_clustering = c.ClusteringSettings(
...
with_printed_unwrapped_clusters=True,
print_mode="connectivity", # "all", "connectivity", "individual"
...
)
LogsWriter
Writes execution logs containing version information and configuration settings to a text file. Provides a record of analysis runs for reproducibility and debugging.
Inheritance
Inherits from BaseWriter.
Initialization
LogsWriter(settings: Settings)
Creates a logs file writer with specified configuration settings.
Parameters
settings(Settings): Configuration settings to be logged.
Attributes
_settings(Settings): Configuration settings to write to log file.
Methods
write() -> NoneWrites the log file containing version banner and complete settings configuration.
Output File:
{export_directory}/log.txtFile Contents:
ASCII art title banner (from
print_title_to_file)Package version number
Complete string representation of all configuration settings
Returns: None
Use Case: Creates a permanent record of analysis parameters for each run, facilitating result reproducibility and parameter tracking.
Usage Example
# General settings
config_general = c.GeneralSettings(
...
save_logs=True, # Save logs (save logs to export_directory/logs.txt)
...
)
PerformanceWriter
Writes performance metrics to JSON files with datetime serialization support. Stores execution time, memory usage, CPU utilization, and custom metrics for analysis runs.
Inheritance
Inherits from BaseWriter.
Initialization
PerformanceWriter(settings: Settings)
Creates a performance data writer with specified configuration settings.
Parameters
settings(Settings): Configuration settings specifying export directory and output paths.
Attributes
_settings(Settings): Configuration settings controlling file output location.
Methods
write(performance: Performance) -> NoneWrites a
Performanceobject to a JSON file with proper datetime serialization.Parameters:
performance(Performance): Performance metrics object to serialize and write.
Returns: None
Output File:
{export_directory}/performance_{performance.name}.jsonBehavior:
Converts the
Performancedataclass to a dictionary usingasdict().Serializes all
datetimeobjects to ISO 8601 format strings.Handles datetime conversion in both main timestamp and history entries.
Writes formatted JSON with 2-space indentation for readability.
JSON Structure: Mirrors the
Performancedataclass structure with converted datetime fields.
serialize_datetime(obj)(internal helper)Converts
datetimeobjects to ISO 8601 string format.Parameters:
obj: Object to potentially convert.
Returns: ISO format string if
objisdatetime, otherwise returnsobjunchanged.Use: Enables JSON serialization of datetime objects.
File Format
Format: JSON with 2-space indentation
Datetime Serialization: ISO 8601 format (
YYYY-MM-DDTHH:MM:SS.ffffff)Encoding: UTF-8
Extension:
.json
Reader module (io module)
ReaderFactory
Factory class for creating and managing file reader instances based on file type detection. Implements the factory pattern to provide automatic reader selection based on file extensions.
Initialization
ReaderFactory(settings: Settings)
Creates a reader factory instance and registers all available reader types.
Parameters
settings(Settings): Configuration settings containing file location and other parameters.
Attributes
_readers(dict): Internal registry mapping file extensions to reader instances._settings(Settings): Configuration settings shared across reader instances.
Registered Readers
The factory automatically registers the following reader types:
XYZReader: Handles extended XYZ format trajectory files (
.xyz).LAMMPSReader: Handles LAMMPS dump format trajectory files (
.lammpstrj).
Methods
register_reader(reader: BaseReader) -> NoneRegisters a new reader instance in the factory.
Parameters:
reader(BaseReader): Reader instance to register.
Returns: None
Behavior:
Tests reader with dummy filenames for common extensions (
.xyz,.lammpstrj,.other).Associates the first matching extension with the reader instance.
Allows dynamic reader registration for extensibility.
get_reader() -> Optional[BaseReader]Returns the appropriate reader instance for the file specified in settings.
Returns: Reader instance supporting the file type, or
Noneif no compatible reader found.Behavior:
Validates that the file path in settings exists.
Iterates through registered readers testing file detection.
Returns the first reader that successfully detects the file format.
Raises:
ValueError: If the file specified insettings.file_locationdoes not exist.
Extensibility
To add support for new file formats:
Create a new reader class inheriting from
BaseReaderImplement the
detect()method to identify compatible filesRegister the reader in
ReaderFactory.__init__():
class CustomReader(BaseReader):
def detect(self, filename: str) -> bool:
return filename.endswith('.custom')
# Implement other BaseReader methods...
# In ReaderFactory.__init__():
self.register_reader(CustomReader(settings))
Design Pattern
The factory pattern provides:
Automatic format detection: No manual reader selection required
Centralized management: All readers registered in one location
Easy extension: New readers added without modifying client code
Type safety: Returns
Optional[BaseReader]for type checking
XYZReader
Reader for parsing extended XYZ format trajectory files. Efficiently scans and indexes frames for random access, supporting large trajectory files with frame-level seeking capabilities.
Inheritance
Inherits from BaseReader.
Initialization
XYZReader(settings: Settings)
Creates an XYZ file reader with specified configuration settings.
Parameters
settings(Settings): Configuration settings containing file location, lattice parameters, and other options.
Attributes
frame_indices(List[FrameIndex]): List of frame metadata objects containing frame IDs, node counts, lattice matrices, and byte offsets.num_frames(int): Total number of frames indexed in the trajectory file.is_indexed(bool): Flag indicating whether the file has been scanned and indexed.
Named Tuple: FrameIndex
FrameIndex = namedtuple("FrameIndex", ["frame_id", "num_nodes", "lattice", "byte_offset"])
Stores metadata for fast frame access:
frame_id(int): Sequential frame identifier.num_nodes(int): Number of atoms in the frame.lattice(np.ndarray): 3×3 lattice matrix from frame header.byte_offset(int): File position (in bytes) where the frame begins.
Methods
detect(filepath: str) -> boolDetermines if the reader supports the specified file format.
Parameters:
filepath(str): Path to the file to check.
Returns:
Trueif file has.xyzextension (case-insensitive),Falseotherwise.Use: Enables automatic reader selection by
ReaderFactory.
scan() -> List[FrameIndex]Scans the entire trajectory file to build an index of all frames.
Returns: List of
FrameIndexobjects for all frames in the file.Behavior:
Opens file with buffered I/O for efficient sequential reading.
For each frame:
Records byte offset of frame start.
Parses number of atoms from first line.
Extracts lattice matrix from extended XYZ header.
Skips atomic data lines to reach next frame.
Creates
FrameIndexobject with metadata.
Updates
frame_indices,num_frames, andis_indexedattributes.Prints summary message if verbose mode enabled.
Extended XYZ Format Requirements:
Line 1: Number of atoms (integer).
Line 2: Comment line containing
Lattice="lxx lxy lxz lyx lyy lyz lzx lzy lzz".Subsequent lines: Atomic data (symbol x y z …).
Raises:
IOError: If frame header is malformed or parsing fails.FileNotFoundError: If trajectory file does not exist.
Performance: Single-pass sequential scan with buffered I/O for large files.
parse(frame_id: int) -> Generator[Frame, None, None]Parses a specific frame from the trajectory file and yields a
Frameobject.Parameters:
frame_id(int): Zero-based index of the frame to parse.
Yields:
Frameobject containing atomic symbols, positions, and lattice information.Behavior:
Calls
scan()if file not yet indexed.Retrieves frame metadata from
frame_indices[frame_id].Seeks directly to frame’s byte offset in file.
Parses lattice matrix from header.
Applies custom lattice if
settings.lattice.apply_custom_latticeisTrue, otherwise uses lattice from file.Skips 2 header lines.
Reads atomic data:
Extracts symbol (element) and x, y, z coordinates.
Stores as lists in data dictionary.
Constructs and yields
Frameobject with parsed data.
Raises:
ValueError: If atomic data line format is invalid (must have at least 4 fields: symbol, x, y, z).
Generator Pattern: Uses generator for memory efficiency with large trajectories.
Extended XYZ Format
Expected file structure for each frame:
{num_atoms}
Lattice="{lxx} {lxy} {lxz} {lyx} {lyy} {lyz} {lzx} {lzy} {lzz}" Properties=...
{symbol} {x} {y} {z} [additional properties...]
{symbol} {x} {y} {z} [additional properties...]
...
Example:
3
Lattice="10.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 10.0" Properties=species:S:1:pos:R:3
Si 1.234 2.345 3.456
O 4.567 5.678 6.789
Si 7.890 8.901 9.012
Usage Example
# Import necessary modules
from nexus import SettingsBuilder, main
import nexus.config.settings as c
# Path to the trajectory file which ends with the extension `xyz`
path = "./examples/inputs/example-SiO2-27216at.xyz"
# General settings
config_general = c.GeneralSettings(
project_name="example-SiO2", # Project name
export_directory="./examples/outputs", # Export directory
file_location=path, # File location
range_of_frames=(0, 0), # Range of frames
apply_pbc=True, # Apply periodic boundary conditions
verbose=True, # Verbose mode (if True, print title, progress bars, etc.)
save_logs=True, # Save logs (save logs to export_directory/logs.txt)
save_performance=True, # Save performance (save performance data to export_directory/performance...json)
)
...
Performance Characteristics
Indexing: O(n) single-pass scan where n is the number of frames.
Frame Access: O(1) direct seek using byte offsets.
Memory: Minimal memory footprint—stores only frame metadata, not atomic data.
File I/O: Buffered reading for efficient disk access.
Scalability: Handles trajectories with thousands of frames efficiently.
LAMMPSReader
Reader for parsing LAMMPS trajectory dump files. Efficiently scans and indexes frames for random access, supporting multiple LAMMPS file extensions and custom column layouts.
Inheritance
Inherits from BaseReader.
Initialization
LAMMPSReader(settings: Settings)
Creates a LAMMPS file reader with specified configuration settings.
Parameters
settings(Settings): Configuration settings containing file location, lattice parameters, and other options.
Attributes
frame_indices(List[FrameIndex]): List of frame metadata objects containing frame IDs, node counts, lattice matrices, and byte offsets.num_frames(int): Total number of frames indexed in the trajectory file.is_indexed(bool): Flag indicating whether the file has been scanned and indexed.columns(dict): Mapping of column names to their indices in atomic data lines.
Named Tuple: FrameIndex
FrameIndex = namedtuple("FrameIndex", ["frame_id", "num_nodes", "lattice", "byte_offset"])
Stores metadata for fast frame access:
frame_id(int): Sequential frame identifier.num_nodes(int): Number of atoms in the frame.lattice(np.ndarray): 3×3 lattice matrix derived from box bounds.byte_offset(int): File position (in bytes) where the frame begins.
Methods
detect(filepath: str) -> boolDetermines if the reader supports the specified file format.
Parameters:
filepath(str): Path to the file to check.
Returns:
Trueif file has.lammpstrj,.lammps, or.dataextension (case-insensitive),Falseotherwise.Use: Enables automatic reader selection by
ReaderFactory.
scan() -> List[FrameIndex]Scans the entire LAMMPS trajectory file to build an index of all frames.
Returns: List of
FrameIndexobjects for all frames in the file.Behavior:
Opens file with buffered I/O for efficient sequential reading.
For each frame:
Records byte offset of frame start.
Validates
ITEM: TIMESTEPheader to identify frame boundaries.Parses number of atoms from
ITEM: NUMBER OF NODESsection.Extracts box bounds from
ITEM: BOX BOUNDSsection.Constructs orthogonal 3×3 lattice matrix from box dimensions: $l_{ii} = \text{high} - \text{low}$.
Parses column header from
ITEM: ATOMSline to build column index mapping.Skips atomic data lines to reach next frame.
Creates
FrameIndexobject with metadata.
Updates
frame_indices,num_frames, andis_indexedattributes.Prints summary message if verbose mode enabled.
LAMMPS Format Requirements:
ITEM: TIMESTEPmarker.ITEM: NUMBER OF NODESfollowed by atom count.ITEM: BOX BOUNDSfollowed by 3 lines withlow highbounds.ITEM: ATOMSfollowed by column names (e.g.,id type x y z).
Raises:
IOError: If frame header is malformed or parsing fails.FileNotFoundError: If trajectory file does not exist.
Limitations: Assumes orthogonal (non-triclinic) simulation boxes.
parse(frame_id: int) -> Generator[Frame, None, None]Parses a specific frame from the LAMMPS trajectory file and yields a
Frameobject.Parameters:
frame_id(int): Zero-based index of the frame to parse.
Yields:
Frameobject containing atomic types, positions, and lattice information.Behavior:
Calls
scan()if file not yet indexed.Retrieves frame metadata from
frame_indices[frame_id].Seeks directly to frame’s byte offset in file.
Skips 9 header lines (timestep, number of atoms, box bounds, and atoms header).
Uses
columnsdictionary to locate type, x, y, z fields in atomic data.Reads atomic data:
Extracts atom type and x, y, z coordinates using column indices.
Stores as lists in data dictionary.
Constructs and yields
Frameobject with parsed data.
Raises:
ValueError: If atomic data line format is invalid.
Generator Pattern: Uses generator for memory efficiency with large trajectories.
Column Flexibility: Adapts to different column orderings in LAMMPS dump files.
LAMMPS Dump Format
Expected file structure for each frame:
ITEM: TIMESTEP
{timestep}
ITEM: NUMBER OF ATOMS
{num_atoms}
ITEM: BOX BOUNDS pp pp pp
{xlo} {xhi}
{ylo} {yhi}
{zlo} {zhi}
ITEM: ATOMS id type x y z [additional properties...]
{id} {type} {x} {y} {z} ...
{id} {type} {x} {y} {z} ...
...
Example:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS pp pp pp
0.0 10.0
0.0 10.0
0.0 10.0
ITEM: ATOMS id type x y z
1 1 1.234 2.345 3.456
2 2 4.567 5.678 6.789
3 1 7.890 8.901 9.012
Usage Example
from io.reader import LAMMPSReader
from config.settings import Settings
# Path to the trajectory file which ends with the extension `.lammpstrj`
settings = Settings(
file_location="trajectory.lammpstrj",
range_of_frames=(0, 100),
verbose=True
)
# Create reader and scan file
reader = LAMMPSReader(settings)
reader.scan()
print(f"Total frames: {reader.num_frames}")
print(f"Column layout: {reader.columns}")
# Parse specific frames
for frame_id in [0, 10, 50]:
frame_gen = reader.parse(frame_id)
frame = next(frame_gen)
print(f"Frame {frame_id}:")
print(f" Atoms: {len(frame.nodes)}")
print(f" Lattice:\n{frame.lattice}")
Performance Characteristics
Indexing: O(n) single-pass scan where n is the number of frames.
Frame Access: O(1) direct seek using byte offsets.
Memory: Minimal memory footprint—stores only frame metadata, not atomic data.
File I/O: Buffered reading for efficient disk access.
Scalability: Handles trajectories with thousands of frames efficiently.
Limitations
Box Geometry: Currently supports only orthogonal (non-triclinic) simulation boxes. Triclinic boxes with xy, xz, yz tilts are not yet supported.
Column Detection: Assumes standard LAMMPS dump format with
ITEM: ATOMSheader containing column names.Atom Types: Stores atom types as strings. Conversion to element symbols requires external mapping.
utils Module
aesthetics module
Utility functions for terminal output, color gradients, and file deduplication.
Functions
print_title(__version__: str) -> NonePrints an ASCII art title banner with version information to the console.
Uses colorama for colored terminal output (light blue for banner).
Parameters:
__version__(str): Version string of the package.
Returns: None
Displays formatted banner and version in the terminal.
print_title_to_file(__version__: str, path: str) -> NoneWrites an ASCII art title banner with version information to a file.
Identical to
print_titlebut outputs to a file instead of console.Parameters:
__version__(str): Version string of the package.path(str): File path where the title should be written.
Returns: None
Creates or overwrites the specified file with the banner and version.
generate_color_gradient(num_iterations: int) -> List[Tuple[int, int, int]]Generates a gradient of RGB colors interpolating from red to blue.
Used for dynamic color updates in progress bars (tqdm).
Parameters:
num_iterations(int): Number of color steps in the gradient.
Returns: List of RGB tuples, each containing three integers (0-255).
Behavior:
If
num_iterations == 0: Returns single start color (red).If
num_iterations == 1: Returns both start and end colors (red and blue).Otherwise: Returns smooth HSV-interpolated gradient from red to blue.
remove_duplicate_lines(filepath: str) -> NoneRemoves duplicate lines from a file while preserving line order.
Reads file, identifies unique lines using
OrderedDict, and rewrites the file.Parameters:
filepath(str): Path to the file to process.
Returns: None
Use Case: Prevents redundant data entries in output files when appending analysis results across multiple runs.
Preserves first occurrence of each unique line and maintains original ordering.
Module Exports
__all__ = [
'print_title',
'generate_color_gradient',
'remove_duplicate_lines',
'print_title_to_file'
]
geometry module
Numba-accelerated geometric calculations for periodic systems: coordinate transformations, distance and angle computations, and gyration radius.
Functions
wrap_position(position: np.ndarray, lattice: np.ndarray) -> np.ndarrayWraps a single position vector into the primary periodic cell defined by the lattice matrix.
JIT-compiled with Numba for high performance (nopython mode, caching, fast math).
Parameters:
position(np.ndarray): 3D Cartesian position vector to wrap.lattice(np.ndarray): 3×3 lattice matrix defining the periodic cell.
Returns: Wrapped position vector inside the primary cell.
Method: Converts to fractional coordinates, applies modulo operation via floor subtraction, converts back to Cartesian.
Reference: Fractional coordinates on Wikipedia
wrap_positions(positions: np.ndarray, lattice: np.ndarray) -> np.ndarrayWraps multiple position vectors into the primary periodic cell.
JIT-compiled with Numba for vectorized high-performance computation.
Parameters:
positions(np.ndarray): N×3 array of Cartesian position vectors.lattice(np.ndarray): 3×3 lattice matrix defining the periodic cell.
Returns: N×3 array of wrapped position vectors.
Method: Iteratively applies
wrap_positionlogic to each position in the array.
calculate_direct_distance(position1: np.ndarray, position2: np.ndarray) -> floatComputes the Euclidean distance between two positions in direct space (no periodic boundaries).
JIT-compiled with Numba.
Parameters:
position1(np.ndarray): First 3D position vector.position2(np.ndarray): Second 3D position vector.
Returns: Distance as a float.
Formula: $d = |\mathbf{r}_1 - \mathbf{r}_2|$
calculate_pbc_distance(position1: np.ndarray, position2: np.ndarray, lattice: np.ndarray) -> floatComputes the minimum image distance between two positions under periodic boundary conditions.
JIT-compiled with Numba.
Parameters:
position1(np.ndarray): First 3D position vector.position2(np.ndarray): Second 3D position vector.lattice(np.ndarray): 3×3 lattice matrix defining the periodic cell.
Returns: Minimum distance as a float.
Method:
Converts displacement vector to fractional coordinates.
Applies minimum image convention: $\mathbf{f} \leftarrow \mathbf{f} - \text{round}(\mathbf{f})$.
Converts back to Cartesian and computes norm.
Physical Significance: Ensures distances are computed considering the nearest periodic image, essential for accurate neighbor detection in periodic systems.
Reference: Fractional coordinates on Wikipedia
calculate_direct_angle(position1: np.ndarray, position2: np.ndarray, position3: np.ndarray) -> floatCalculates the angle formed by three positions in direct space, with
position2as the vertex.JIT-compiled with Numba.
Parameters:
position1(np.ndarray): First position (end of vector 1).position2(np.ndarray): Vertex position (center).position3(np.ndarray): Third position (end of vector 2).
Returns: Angle in degrees.
Formula: $$ \theta = \arccos\left(\frac{(\mathbf{r}_1 - \mathbf{r}_2) \cdot (\mathbf{r}_3 - \mathbf{r}_2)}{|\mathbf{r}_1 - \mathbf{r}_2| |\mathbf{r}_3 - \mathbf{r}_2|}\right) $$
calculate_pbc_angle(position1: np.ndarray, position2: np.ndarray, position3: np.ndarray, lattice: np.ndarray) -> floatCalculates the angle formed by three positions under periodic boundary conditions.
JIT-compiled with Numba.
Parameters:
position1(np.ndarray): First position (end of vector 1).position2(np.ndarray): Vertex position (center).position3(np.ndarray): Third position (end of vector 2).lattice(np.ndarray): 3×3 lattice matrix.
Returns: Angle in degrees.
Method:
Converts displacement vectors to fractional coordinates.
Applies minimum image convention to both vectors.
Converts back to Cartesian and computes angle using dot product.
Clips cosine value to [-1, 1] to handle numerical precision issues.
Physical Significance: Ensures angle calculations account for nearest periodic images, critical for bond angle analysis in periodic systems.
cartesian_to_fractional(position: np.ndarray, lattice: np.ndarray) -> np.ndarrayConverts Cartesian coordinates to fractional coordinates in the lattice basis.
Parameters:
position(np.ndarray): Cartesian position vector or array of vectors.lattice(np.ndarray): 3×3 lattice matrix.
Returns: Fractional coordinates.
Method: Solves $\mathbf{L}^T \mathbf{f} = \mathbf{r}$ where $\mathbf{L}$ is the lattice, $\mathbf{r}$ is Cartesian, and $\mathbf{f}$ is fractional.
Reference: Fractional coordinates on Wikipedia
fractional_to_cartesian(position: np.ndarray, lattice: np.ndarray) -> np.ndarrayConverts fractional coordinates to Cartesian coordinates.
Parameters:
position(np.ndarray): Fractional position vector or array of vectors.lattice(np.ndarray): 3×3 lattice matrix.
Returns: Cartesian coordinates.
Formula: $\mathbf{r} = \mathbf{f} \cdot \mathbf{L}$ where $\mathbf{L}$ is the lattice matrix.
Reference: Fractional coordinates on Wikipedia
calculate_gyration_radius(positions: np.ndarray, center_of_mass: np.ndarray) -> floatCalculates the radius of gyration for a set of positions about their center of mass.
JIT-compiled with Numba.
Parameters:
positions(np.ndarray): N×3 array of positions.center_of_mass(np.ndarray): 3D center of mass position vector.
Returns: Radius of gyration as a float.
Formula: $$ R_g = \sqrt{\frac{1}{N} \sum_{i=1}^{N} |\mathbf{r}i - \mathbf{r}{\text{COM}}|^2} $$
Physical Significance: Quantifies the spatial extent and compactness of a cluster. Returns 0.0 for empty position arrays.
Module Exports
__all__ = [
'wrap_position',
'wrap_positions',
'calculate_direct_distance',
'calculate_pbc_distance',
'calculate_direct_angle',
'calculate_pbc_angle',
'calculate_gyration_radius',
'cartesian_to_fractional',
'fractional_to_cartesian'
]
Performance Optimization
Most functions use Numba’s JIT compilation with:
nopython=True: Forces compilation to pure machine code without Python interpreter overhead.
cache=True: Caches compiled functions to disk for faster subsequent imports.
fastmath=True: Enables aggressive floating-point optimizations for improved performance.
np.ascontiguousarray(): Ensures memory layout compatibility for optimized operations.
Performance
Stores and tracks performance metrics for pipeline operations. Records execution time, memory usage, and CPU usage for a named operation. Supports custom metrics and historical snapshots for trend analysis.
Attributes
Required Fields:
id(str): Unique identifier for the performance record.name(str): Descriptive name of the operation or component being measured.timestamp(datetime): Timestamp of record creation. Defaults to current time viadatetime.now().
Performance Metrics:
execution_time_ms(Optional[float]): Execution time in milliseconds. Default:None.memory_usage_mb(Optional[float]): Memory consumption in megabytes. Default:None.cpu_usage_percent(Optional[float]): CPU utilization as a percentage. Default:None.
Extensibility:
metrics(Dict[str, Any]): Dictionary for storing additional custom metrics. Default: empty dictionary.history(List[Dict[str, Any]]): List of historical performance snapshots. Default: empty list.
Methods
add_metric(name: str, value: Any) -> NoneAdds a custom metric to the performance record.
Parameters:
name(str): Name of the metric to store.value(Any): Value associated with the metric.
Returns: None
Stores the metric in the
metricsdictionary, allowing flexible tracking of domain-specific performance indicators.
record_history() -> NoneCaptures a snapshot of the current performance state and appends it to the history list.
Returns: None
Snapshot Contents:
Current timestamp
Execution time (if set)
Memory usage (if set)
CPU usage (if set)
Copy of all custom metrics
Useful for tracking performance evolution over multiple iterations or frames.
get_average_execution_time() -> Optional[float]Calculates the mean execution time from all historical records.
Returns: Average execution time in milliseconds, or
Noneif no valid entries exist in history.Filters out
Nonevalues before averaging.
__str__() -> strProvides a human-readable string representation of the performance record.
Returns: Formatted multi-line string showing ID, name, timestamp, core metrics, and additional metrics.
Usage Example
import nexus.config.settings as c
general_settings = c.GeneralSettings(
...
save_performance=True, # Save performance (save performance data to export_directory/performance.json)
...
)
...