matbench_genmetrics.core.utils namespace

Submodules

matbench_genmetrics.core.utils.featurize module

matbench_genmetrics.core.utils.featurize.cdvae_cov_comp_fingerprints(structures: List[Structure], verbose: bool = False)[source]

This function takes a list of pymatgen Structure objects and generates composition fingerprints for each structure using the ElementProperty featurizer from the matminer library.

The featurizer uses a preset of “magpie” to generate a set of element property features. The function also has a verbose mode that, when enabled, provides a running progress bar.

Parameters:
  • structures (List[Structure]) – List of pymatgen Structure objects to be featurized.

  • verbose (bool, optional) – If True, the function will provide more detailed output, by default False.

Returns:

A list of lists, where each inner list contains the composition fingerprints for a structure.

Return type:

List[List[float]]

Examples

>>> fingerprints = cdvae_cov_comp_fingerprints(structures, verbose=False)
matbench_genmetrics.core.utils.featurize.cdvae_cov_struct_fingerprints(structures: List[Structure], verbose: bool = False)[source]

This function takes a list of pymatgen Structure objects and generates structure fingerprints for each structure using the CrystalNNFingerprint featurizer from the matminer library.

The featurizer uses a preset of “ops” to generate a set of site fingerprints, and then calculates the mean of these fingerprints. The function also has a verbose mode that, when enabled, provides more detailed output. If a structure fails to featurize, it is replaced with NaN values.

The function is based on an implementation in CDVAE: https://github.com/txie-93/cdvae.

Parameters:
  • structures (List[Structure]) – List of pymatgen Structure objects to be featurized.

  • verbose (bool, optional) – If True, the function will provide more detailed output, by default False.

Returns:

A list of lists, where each inner list contains the structure fingerprints for a structure.

Return type:

List[List[float]]

Examples

>>> fingerprints = cdvae_cov_struct_fingerprints(structures, verbose=False)
matbench_genmetrics.core.utils.featurize.featurize_comp_struct(structures: List[Structure], material_ids: List | None = None, comp_name='composition', struct_name='structure', material_id_name='material_id', include_pmg_object=False, keep_as_df=False)[source]

This function takes a list of structures and optional material IDs, and generates composition and structure fingerprints using different types of featurizers.

The composition fingerprints are generated using the ElementProperty featurizer from the matminer library, which uses a preset of “magpie” to generate a set of element property features. The structure fingerprints are generated using matminer’s SiteStatsFingerprint featurizer, which uses a CrystalNNFingerprint instance (with a preset of “ops”) to generate site fingerprints, and then calculates the mean of these fingerprints.

The function also allows for customization of the names of the composition, structure, and material ID columns. It can optionally include the pymatgen object in the output and can return the data as a dataframe or as a numpy array.

Parameters:
  • structures (List[Structure]) – List of pymatgen Structure objects to be featurized.

  • material_ids (Optional[List], optional) – List of material IDs corresponding to the structures, by default None.

  • comp_name (str, optional) – Name to use for the composition column, by default “composition”.

  • struct_name (str, optional) – Name to use for the structure column, by default “structure”.

  • material_id_name (str, optional) – Name to use for the material ID column, by default “material_id”.

  • include_pmg_object (bool, optional) – Whether to include the pymatgen object in the output, by default False.

  • keep_as_df (bool, optional) – Whether to keep the output as a dataframe, by default False. If False, the output will be a numpy array.

Returns:

A tuple of two dataframes: the first contains the composition fingerprints, and the second contains the structure fingerprints.

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

Examples

>>> comp_fingerprints, struct_fingerprints = featurize_comp_struct(structures, material_ids=None, comp_name="composition", struct_name="structure", material_id_name="material_id", include_pmg_object=False, keep_as_df=False) # noqa: E501
matbench_genmetrics.core.utils.featurize.mod_petti_contributions(structures: List[Structure])[source]

This function takes a list of pymatgen Structure objects and calculates the modified Pettifor number contributions for each element in the structures.

The modified Pettifor number is a measure of the electronegativity of an element in a specific structure. The function returns a dataframe sorted by the modified Pettifor number.

Parameters:

structures (List[Structure]) – List of pymatgen Structure objects for which to calculate the modified Pettifor number contributions.

Returns:

A dataframe with two columns: ‘mod_petti’, which contains the modified Pettifor numbers, and ‘contribution’, which contains the corresponding contributions of each element in the structures. The dataframe is sorted by the ‘mod_petti’ column.

Return type:

pd.DataFrame

Examples

>>> mod_petti_df = mod_petti_contributions(structures)

matbench_genmetrics.core.utils.match module

matbench_genmetrics.core.utils.match.cdvae_cov_compstruct_match_matrix(test_comp_fingerprints: List[List[float]], gen_comp_fingerprints: List[List[float]], test_struct_fingerprints: List[List[float]], gen_struct_fingerprints: List[List[float]], symmetric: bool = False, comp_cutoff: float = 10.0, struct_cutoff: float = 0.4, verbose: bool = False)[source]

This function computes a match matrix between two sets of composition and structure fingerprints (test and generated) based on specified cutoff distances.

Parameters:
  • test_comp_fingerprints (List[List[float]]) – List of test composition fingerprints to be compared. Each fingerprint is a list of floats.

  • gen_comp_fingerprints (List[List[float]]) – List of generated composition fingerprints to be compared. Each fingerprint is a list of floats.

  • test_struct_fingerprints (List[List[float]]) – List of test structure fingerprints to be compared. Each fingerprint is a list of floats.

  • gen_struct_fingerprints (List[List[float]]) – List of generated structure fingerprints to be compared. Each fingerprint is a list of floats.

  • symmetric (bool, optional) – If True, the function will compute a symmetric match matrix, otherwise a list in the style of cdist, by default False.

  • comp_cutoff (float, optional) – The cutoff distance for matching composition fingerprints, by default 10.0.

  • struct_cutoff (float, optional) – The cutoff distance for matching structure fingerprints, by default 0.4.

  • verbose (bool, optional) – If True, the function will provide a running progress bar, by default False.

matbench_genmetrics.core.utils.match.cdvae_cov_match_matrix(test_fingerprints: List[List[float]], gen_fingerprints: List[List[float]], symmetric: bool = False, cutoff: float = 10.0)[source]

This function computes a match matrix between two sets of fingerprints (test and generated) based on a cutoff distance.

Parameters:
  • test_fingerprints (List[List[float]]) – List of test fingerprints to be compared. Each fingerprint is a list of floats.

  • gen_fingerprints (List[List[float]]) – List of generated fingerprints to be compared. Each fingerprint is a list of floats.

  • symmetric (bool, optional) – If True, the function will compute a symmetric match matrix, else an array in the style of cdist, by default False.

  • cutoff (float, optional) – The cutoff distance for matching fingerprints, by default 10.0.

Returns:

A numpy array representing the match matrix (either squareform(pdist) or cdist style depending on symmetric arg). Each entry is a boolean indicating whether the corresponding pair of fingerprints match (True) or not (False).

Return type:

np.ndarray

matbench_genmetrics.core.utils.match.dummy_tqdm(x, **kwargs)[source]

A dummy function that simply returns its input. Used as a placeholder for the tqdm progress bar function when verbose mode is not enabled.

matbench_genmetrics.core.utils.match.get_fingerprint_match_matrix(test_comp_fingerprints: List[List[float]], gen_comp_fingerprints: List[List[float]], test_struct_fingerprints: List[List[float]], gen_struct_fingerprints: List[List[float]], match_type: str = 'cdvae_coverage', symmetric: bool = False, verbose: bool = False, **match_kwargs)[source]

This function computes a match matrix between two sets of composition and structure fingerprints (test and generated) using a specified match function. The match function is determined by the match_type parameter.

Parameters:
  • test_comp_fingerprints (List[List[float]]) – List of test composition fingerprints to be compared. Each fingerprint is a list of floats.

  • gen_comp_fingerprints (List[List[float]]) – List of generated composition fingerprints to be compared. Each fingerprint is a list of floats.

  • test_struct_fingerprints (List[List[float]]) – List of test structure fingerprints to be compared. Each fingerprint is a list of floats.

  • gen_struct_fingerprints (List[List[float]]) – List of generated structure fingerprints to be compared. Each fingerprint is a list of floats.

  • match_type (str, optional) – The type of match function to use, by default “cdvae_coverage”.

  • symmetric (bool, optional) – If True, the function will compute a symmetric match matrix else an array in the cdist style, by default False.

  • verbose (bool, optional) – If True, the function will provide a running progress bar, by default False.

Returns:

A 2D numpy array representing the match matrix.

Return type:

np.ndarray

Raises:

ValueError – If the match_type is not recognized.

matbench_genmetrics.core.utils.match.get_structure_match_matrix(test_structures: List[Structure], gen_structures: List[Structure], match_type: str = 'cdvae_coverage', symmetric: bool = False, verbose: bool = False, **match_kwargs)[source]

This function computes a match matrix between two lists of pymatgen Structure objects using a specified match function. The match function is determined by the match_type parameter.

Parameters:
  • test_structures (List[Structure]) – List of pymatgen Structure objects to be compared.

  • gen_structures (List[Structure]) – List of pymatgen Structure objects to be compared.

  • match_type (str, optional) – The type of match function to use, by default “cdvae_coverage”.

  • symmetric (bool, optional) – If True, the function will compute a symmetric match matrix, else an array in the style of cdist, by default False.

  • verbose (bool, optional) – If True, the function will provide a running progress bar, by default False.

Returns:

A 2D numpy array representing the match matrix.

Return type:

np.ndarray

Raises:

ValueError – If the match_type is not recognized.

matbench_genmetrics.core.utils.match.get_tqdm(verbose)[source]

Returns the appropriate tqdm function based on the environment and verbosity. If verbose is False, returns a dummy function that does nothing.

Parameters:

verbose (bool) – If True, returns the tqdm function. If False, returns a dummy function.

matbench_genmetrics.core.utils.match.structure_matcher(s1: Structure, s2: Structure)[source]

Checks if two pymatgen Structure objects match according to pymatgen’s StructureMatcher criteria with the following settings:

StructureMatcher(stol=0.5, ltol=0.3, angle_tol=10.0)

Parameters:
  • s1 (Structure) – The first structure to compare.

  • s2 (Structure) – The second structure to compare.

Returns:

True if the structures match, False otherwise.

Return type:

bool

matbench_genmetrics.core.utils.match.structure_pairwise_match_matrix(test_structures: List[Structure], gen_structures: List[Structure], match_type: str = 'StructureMatcher', verbose: bool = False, symmetric: bool = False)[source]

This function computes a pairwise match matrix between two lists of pymatgen Structure objects using a specified match function. The match function is determined by the match_type parameter.

Parameters:
  • test_structures (List[Structure]) – List of pymatgen Structure objects to be compared.

  • gen_structures (List[Structure]) – List of pymatgen Structure objects to be compared.

  • match_type (str, optional) – The type of match function to use, by default “StructureMatcher”.

  • verbose (bool, optional) – If True, the function will provide a running progress bar, by default False.

  • symmetric (bool, optional) – If True, the function will compute a symmetric match matrix else an array in the style of cdist, by default False.

matbench_genmetrics.core.utils.plotting module

matbench_genmetrics.core.utils.plotting.plot_images(images, nrows, ncols, seed=10, formula_as_title=True)[source]

This function plots a list of images. If the number of images is greater than the number of subplots, it randomly selects a subset of images to plot.

Parameters:
  • images (List[np.ndarray]) – List of images to be plotted. Each image is a numpy array.

  • nrows (int) – Number of rows in the subplot grid.

  • ncols (int) – Number of columns in the subplot grid.

  • seed (int, optional) – Seed for the random number generator, by default 10.

  • formula_as_title (bool, optional) – If True, the reduced formula of the structure corresponding to the image is used as the title of the subplot, by default True.

Returns:

A tuple containing the matplotlib Figure object and an array of Axes objects.

Return type:

Tuple[plt.Figure, np.ndarray]

matbench_genmetrics.core.utils.plotting.plot_structures_2d(structures, nrows, ncols, seed=10, formula_as_title=True)[source]

This function plots 2D representations of a list of pymatgen Structure objects. If the number of structures is greater than the number of subplots, it randomly selects a subset of structures to plot.

Parameters:
  • structures (List[Structure]) – List of pymatgen Structure objects to be plotted.

  • nrows (int) – Number of rows in the subplot grid.

  • ncols (int) – Number of columns in the subplot grid.

  • seed (int, optional) – Seed for the random number generator, by default 10.

  • formula_as_title (bool, optional) – If True, the reduced formula of the structure is used as the title of the subplot, by default True.

Returns:

A tuple containing the matplotlib Figure object and an array of Axes objects.

Return type:

Tuple[plt.Figure, np.ndarray]