cdl-parser

Parse and validate Crystal Description Language (CDL) expressions. Zero external dependencies. Version 1.3.0.

pip install gemmology-cdl-parser

Functions

# parse_cdl(text: str) → CrystalDescription

Parse a CDL string into a structured CrystalDescription object. Handles comments, definitions, forms, modifications, twins, and phenomena.

from cdl_parser import parse_cdl

desc = parse_cdl("cubic[m3m]:{111}@1.0 + {100}@1.3")

print(desc.system)       # 'cubic'
print(desc.point_group)  # 'm3m'
print(len(desc.forms))   # 2
print(desc.forms[0])     # CrystalForm({111}, scale=1.0)

# With features
desc = parse_cdl("cubic[m3m]:{111}@1.0[phantom:3] + {100}@1.3")
print(desc.forms[0].features)  # [Feature(name='phantom', values=[3])]

# With definitions and references
desc = parse_cdl("""
@base = {111}@1.0 + {100}@1.3
cubic[m3m]:$base
""")
print(desc.definitions)  # [Definition(name='base', ...)]

Parameters:

  • text - CDL expression string

Returns: CrystalDescription object

Raises:

  • ParseError - if the expression has a syntax error
  • ValidationError - if semantically invalid (e.g., wrong point group for system)
# validate_cdl(text: str) → tuple[bool, str | None]

Validate a CDL expression without raising exceptions. Returns a tuple of (is_valid, error_message).

from cdl_parser import validate_cdl

valid, error = validate_cdl("cubic[m3m]:{111}")
print(valid)   # True
print(error)   # None

valid, error = validate_cdl("invalid[xyz]:{abc}")
print(valid)   # False
print(error)   # "Expected SYSTEM, got IDENTIFIER ..."
# strip_comments(text: str) → tuple[str, list[str]]

Strip comments from CDL text. Extracts doc comments (#!), removes block comments (/* ... */) and line comments (#).

from cdl_parser import strip_comments

text = """
#! Mineral: Diamond
#! System: Cubic
# This is a comment
cubic[m3m]:{111} /* inline block */
"""
cleaned, doc_comments = strip_comments(text)
print(doc_comments)  # ['Mineral: Diamond', 'System: Cubic']

Classes

CrystalDescription

The main output of CDL parsing, containing all information needed to generate a crystal visualization.

AttributeTypeDescription
system str Crystal system name (e.g., "cubic", "trigonal")
point_group str Hermann-Mauguin point group symbol (e.g., "m3m")
forms list[FormNode] List of form nodes (CrystalForm or FormGroup)
modifications list[Modification] Morphological modifications (elongate, flatten, etc.)
twin TwinSpec | None Optional twin specification
phenomenon PhenomenonSpec | None Optional optical phenomenon
definitions list[Definition] | None Named definitions (@name = expression)
doc_comments list[str] | None Doc comments (#! Key: Value)

Methods

# Get a flat list of all CrystalForm objects (flattens FormGroups)
# Features from parent groups are merged into child forms
flat = desc.flat_forms()  # list[CrystalForm]

# Convert to dictionary representation
d = desc.to_dict()

# String representation (reconstructs CDL)
print(str(desc))  # "cubic[m3m]:{111}@1.0 + {100}@1.3"

CrystalForm

Represents a single crystal form (set of symmetry-equivalent faces) with an optional distance scale for truncation.

AttributeTypeDescription
miller MillerIndex Miller index defining the form
scale float Distance scale (default 1.0, larger = more truncated)
name str | None Original name if using named form (e.g., "octahedron")
features list[Feature] | None Per-form feature annotations
label str | None Optional label (e.g., "prism" in prism:{10-10})

FormGroup

A parenthesized group of forms with optional shared features and label. Syntax: (form + form)[shared_features]

AttributeTypeDescription
forms list[FormNode] Child form nodes in this group
features list[Feature] | None Shared features applied to all children
label str | None Optional group label

FormNode

Type alias: FormNode = CrystalForm | FormGroup. Used as the element type for CrystalDescription.forms.

Feature

Describes growth patterns, surface markings, inclusions, or colour properties on a crystal form.

AttributeTypeDescription
name str Feature type (phantom, trigon, silk, colour, etc.)
values list[int | float | str] Feature values (numbers, identifiers, color specs)
# Features are parsed from [name:value] syntax on forms
desc = parse_cdl("cubic[m3m]:{111}@1.0[phantom:3, colour:blue]")
form = desc.forms[0]  # CrystalForm
print(form.features)
# [Feature(name='phantom', values=[3]), Feature(name='colour', values=['blue'])]

TwinSpec

Defines how crystal twinning should be applied.

AttributeTypeDescription
law str | None Named twin law (spinel, brazil, japan, etc.)
axis tuple | None Custom twin axis [x, y, z]
angle float Rotation angle in degrees (default 180)
twin_type str Type: "contact", "penetration", or "cyclic"
count int Number of twin individuals (default 2)
desc = parse_cdl("cubic[m3m]:{111} | twin(spinel)")
print(desc.twin)       # TwinSpec(law='spinel')
print(desc.twin.law)   # 'spinel'
print(desc.twin.count) # 2

Modification

Represents a morphological transformation applied to the crystal shape.

AttributeTypeDescription
type str Type: "elongate", "truncate", "taper", "bevel", or "flatten"
params dict[str, Any] Parameters (e.g., {"axis": "c", "ratio": 1.5})
desc = parse_cdl("cubic[m3m]:{111} | flatten(c:0.5)")
print(desc.modifications)
# [Modification(type='flatten', params={'axis': 'c', 'ratio': 0.5})]

PhenomenonSpec

Optical phenomenon specification (asterism, chatoyancy, etc.).

AttributeTypeDescription
type str Phenomenon type (asterism, chatoyancy, etc.)
params dict[str, int | float | str] Parameters (e.g., {"value": 6})
desc = parse_cdl("trigonal[-3m]:{10-10}@1.0 | phenomenon[asterism:6]")
print(desc.phenomenon)       # PhenomenonSpec(type='asterism', params={'value': 6})
print(desc.phenomenon.type)  # 'asterism'

Definition

A named definition that allows reuse via @name = expression and $name references.

AttributeTypeDescription
name str Definition name (from @name = expression)
body list[FormNode] Parsed form nodes for the definition body

MillerIndex

Represents Miller indices (hkl or hkil for hexagonal/trigonal).

from cdl_parser import MillerIndex

# 3-index notation
mi = MillerIndex(1, 1, 1)
print(mi)            # {111}
print(mi.as_tuple()) # (1, 1, 1)

# 4-index notation (Miller-Bravais)
mi = MillerIndex(1, 0, 1, i=-1)
print(mi)              # {10-11}
print(mi.as_tuple())   # (1, 0, -1, 1)
print(mi.as_3index())  # (1, 0, 1)

Exceptions

CDLError

Base exception for all CDL-related errors.

ParseError

Raised when CDL parsing fails due to a syntax error. Contains position information.

from cdl_parser import parse_cdl, ParseError

try:
    parse_cdl("invalid")
except ParseError as e:
    print(e.message)    # Error description
    print(e.position)   # Character position in input

ValidationError

Raised when CDL is syntactically valid but semantically incorrect (e.g., invalid point group for the given crystal system).

from cdl_parser import parse_cdl, ValidationError

try:
    parse_cdl("cubic[-3m]:{111}")  # -3m is trigonal, not cubic
except ValidationError as e:
    print(e.message)  # "Point group '-3m' not valid for cubic system"
    print(e.field)    # "point_group"
    print(e.value)    # "-3m"

Constants

CRYSTAL_SYSTEMS

Set of the seven crystal system names.

from cdl_parser import CRYSTAL_SYSTEMS

print(CRYSTAL_SYSTEMS)
# {'cubic', 'tetragonal', 'orthorhombic', 'hexagonal', 'trigonal', 'monoclinic', 'triclinic'}

POINT_GROUPS

Dictionary mapping crystal system names to their valid point groups.

from cdl_parser import POINT_GROUPS

print(POINT_GROUPS['cubic'])
# {'m3m', '432', '-43m', 'm-3', '23'}

DEFAULT_POINT_GROUPS

Dictionary mapping each crystal system to its default (highest symmetry) point group.

from cdl_parser import DEFAULT_POINT_GROUPS

print(DEFAULT_POINT_GROUPS['cubic'])     # 'm3m'
print(DEFAULT_POINT_GROUPS['trigonal'])  # '-3m'

NAMED_FORMS

Dictionary mapping form names to Miller indices (h, k, l).

from cdl_parser import NAMED_FORMS

print(NAMED_FORMS['octahedron'])     # (1, 1, 1)
print(NAMED_FORMS['cube'])           # (1, 0, 0)
print(NAMED_FORMS['dodecahedron'])   # (1, 1, 0)
print(NAMED_FORMS['rhombohedron'])   # (1, 0, 1)

TWIN_LAWS

Set of recognised twin law names.

from cdl_parser import TWIN_LAWS

print(sorted(TWIN_LAWS))
# ['albite', 'baveno', 'brazil', 'carlsbad', 'dauphine', 'fluorite',
#  'gypsum_swallow', 'iron_cross', 'japan', 'manebach', 'pericline',
#  'spinel', 'spinel_law', 'staurolite_60', 'staurolite_90', 'trilling']

FEATURE_NAMES

Set of recognised feature annotation names (phantom, trigon, silk, colour, etc.).

PHENOMENON_TYPES

Set of recognised phenomenon types (asterism, chatoyancy, adularescence, etc.).

MODIFICATION_TYPES

Set of recognised modification types: elongate, truncate, taper, bevel, flatten.

Lexer/Parser Internals

For advanced use cases (syntax highlighting, custom processing):

from cdl_parser import Lexer, Token, TokenType

lexer = Lexer("cubic[m3m]:{111}")
tokens = lexer.tokenize()
for token in tokens:
    print(f"{token.type.value}: {token.value}")
# SYSTEM: cubic
# LBRACKET: [
# POINT_GROUP: m3m
# RBRACKET: ]
# COLON: :
# LBRACE: {
# INTEGER: 1
# INTEGER: 1
# INTEGER: 1
# RBRACE: }
# EOF: None