cdl-parser
Parse and validate Crystal Description Language (CDL) expressions. Zero external dependencies. Version 1.3.0.
pip install gemmology-cdl-parser Functions
parse_cdl(text: str) → CrystalDescription Parse a CDL string into a structured CrystalDescription object.
Handles comments, definitions, forms, modifications, twins, and phenomena.
from cdl_parser import parse_cdl
desc = parse_cdl("cubic[m3m]:{111}@1.0 + {100}@1.3")
print(desc.system) # 'cubic'
print(desc.point_group) # 'm3m'
print(len(desc.forms)) # 2
print(desc.forms[0]) # CrystalForm({111}, scale=1.0)
# With features
desc = parse_cdl("cubic[m3m]:{111}@1.0[phantom:3] + {100}@1.3")
print(desc.forms[0].features) # [Feature(name='phantom', values=[3])]
# With definitions and references
desc = parse_cdl("""
@base = {111}@1.0 + {100}@1.3
cubic[m3m]:$base
""")
print(desc.definitions) # [Definition(name='base', ...)] Parameters:
text- CDL expression string
Returns: CrystalDescription object
Raises:
ParseError- if the expression has a syntax errorValidationError- if semantically invalid (e.g., wrong point group for system)
validate_cdl(text: str) → tuple[bool, str | None] Validate a CDL expression without raising exceptions. Returns a tuple of
(is_valid, error_message).
from cdl_parser import validate_cdl
valid, error = validate_cdl("cubic[m3m]:{111}")
print(valid) # True
print(error) # None
valid, error = validate_cdl("invalid[xyz]:{abc}")
print(valid) # False
print(error) # "Expected SYSTEM, got IDENTIFIER ..." strip_comments(text: str) → tuple[str, list[str]] Strip comments from CDL text. Extracts doc comments (#!),
removes block comments (/* ... */) and line comments (#).
from cdl_parser import strip_comments
text = """
#! Mineral: Diamond
#! System: Cubic
# This is a comment
cubic[m3m]:{111} /* inline block */
"""
cleaned, doc_comments = strip_comments(text)
print(doc_comments) # ['Mineral: Diamond', 'System: Cubic'] Classes
CrystalDescription
The main output of CDL parsing, containing all information needed to generate a crystal visualization.
| Attribute | Type | Description |
|---|---|---|
system | str | Crystal system name (e.g., "cubic", "trigonal") |
point_group | str | Hermann-Mauguin point group symbol (e.g., "m3m") |
forms | list[FormNode] | List of form nodes (CrystalForm or FormGroup) |
modifications | list[Modification] | Morphological modifications (elongate, flatten, etc.) |
twin | TwinSpec | None | Optional twin specification |
phenomenon | PhenomenonSpec | None | Optional optical phenomenon |
definitions | list[Definition] | None | Named definitions (@name = expression) |
doc_comments | list[str] | None | Doc comments (#! Key: Value) |
Methods
# Get a flat list of all CrystalForm objects (flattens FormGroups)
# Features from parent groups are merged into child forms
flat = desc.flat_forms() # list[CrystalForm]
# Convert to dictionary representation
d = desc.to_dict()
# String representation (reconstructs CDL)
print(str(desc)) # "cubic[m3m]:{111}@1.0 + {100}@1.3" CrystalForm
Represents a single crystal form (set of symmetry-equivalent faces) with an optional distance scale for truncation.
| Attribute | Type | Description |
|---|---|---|
miller | MillerIndex | Miller index defining the form |
scale | float | Distance scale (default 1.0, larger = more truncated) |
name | str | None | Original name if using named form (e.g., "octahedron") |
features | list[Feature] | None | Per-form feature annotations |
label | str | None | Optional label (e.g., "prism" in prism:{10-10}) |
FormGroup
A parenthesized group of forms with optional shared features and label.
Syntax: (form + form)[shared_features]
| Attribute | Type | Description |
|---|---|---|
forms | list[FormNode] | Child form nodes in this group |
features | list[Feature] | None | Shared features applied to all children |
label | str | None | Optional group label |
FormNode
Type alias: FormNode = CrystalForm | FormGroup.
Used as the element type for CrystalDescription.forms.
Feature
Describes growth patterns, surface markings, inclusions, or colour properties on a crystal form.
| Attribute | Type | Description |
|---|---|---|
name | str | Feature type (phantom, trigon, silk, colour, etc.) |
values | list[int | float | str] | Feature values (numbers, identifiers, color specs) |
# Features are parsed from [name:value] syntax on forms
desc = parse_cdl("cubic[m3m]:{111}@1.0[phantom:3, colour:blue]")
form = desc.forms[0] # CrystalForm
print(form.features)
# [Feature(name='phantom', values=[3]), Feature(name='colour', values=['blue'])] TwinSpec
Defines how crystal twinning should be applied.
| Attribute | Type | Description |
|---|---|---|
law | str | None | Named twin law (spinel, brazil, japan, etc.) |
axis | tuple | None | Custom twin axis [x, y, z] |
angle | float | Rotation angle in degrees (default 180) |
twin_type | str | Type: "contact", "penetration", or "cyclic" |
count | int | Number of twin individuals (default 2) |
desc = parse_cdl("cubic[m3m]:{111} | twin(spinel)")
print(desc.twin) # TwinSpec(law='spinel')
print(desc.twin.law) # 'spinel'
print(desc.twin.count) # 2 Modification
Represents a morphological transformation applied to the crystal shape.
| Attribute | Type | Description |
|---|---|---|
type | str | Type: "elongate", "truncate", "taper", "bevel", or "flatten" |
params | dict[str, Any] | Parameters (e.g., {"axis": "c", "ratio": 1.5}) |
desc = parse_cdl("cubic[m3m]:{111} | flatten(c:0.5)")
print(desc.modifications)
# [Modification(type='flatten', params={'axis': 'c', 'ratio': 0.5})] PhenomenonSpec
Optical phenomenon specification (asterism, chatoyancy, etc.).
| Attribute | Type | Description |
|---|---|---|
type | str | Phenomenon type (asterism, chatoyancy, etc.) |
params | dict[str, int | float | str] | Parameters (e.g., {"value": 6}) |
desc = parse_cdl("trigonal[-3m]:{10-10}@1.0 | phenomenon[asterism:6]")
print(desc.phenomenon) # PhenomenonSpec(type='asterism', params={'value': 6})
print(desc.phenomenon.type) # 'asterism' Definition
A named definition that allows reuse via @name = expression
and $name references.
| Attribute | Type | Description |
|---|---|---|
name | str | Definition name (from @name = expression) |
body | list[FormNode] | Parsed form nodes for the definition body |
MillerIndex
Represents Miller indices (hkl or hkil for hexagonal/trigonal).
from cdl_parser import MillerIndex
# 3-index notation
mi = MillerIndex(1, 1, 1)
print(mi) # {111}
print(mi.as_tuple()) # (1, 1, 1)
# 4-index notation (Miller-Bravais)
mi = MillerIndex(1, 0, 1, i=-1)
print(mi) # {10-11}
print(mi.as_tuple()) # (1, 0, -1, 1)
print(mi.as_3index()) # (1, 0, 1) Exceptions
CDLError
Base exception for all CDL-related errors.
ParseError
Raised when CDL parsing fails due to a syntax error. Contains position information.
from cdl_parser import parse_cdl, ParseError
try:
parse_cdl("invalid")
except ParseError as e:
print(e.message) # Error description
print(e.position) # Character position in input ValidationError
Raised when CDL is syntactically valid but semantically incorrect (e.g., invalid point group for the given crystal system).
from cdl_parser import parse_cdl, ValidationError
try:
parse_cdl("cubic[-3m]:{111}") # -3m is trigonal, not cubic
except ValidationError as e:
print(e.message) # "Point group '-3m' not valid for cubic system"
print(e.field) # "point_group"
print(e.value) # "-3m" Constants
CRYSTAL_SYSTEMS
Set of the seven crystal system names.
from cdl_parser import CRYSTAL_SYSTEMS
print(CRYSTAL_SYSTEMS)
# {'cubic', 'tetragonal', 'orthorhombic', 'hexagonal', 'trigonal', 'monoclinic', 'triclinic'} POINT_GROUPS
Dictionary mapping crystal system names to their valid point groups.
from cdl_parser import POINT_GROUPS
print(POINT_GROUPS['cubic'])
# {'m3m', '432', '-43m', 'm-3', '23'} DEFAULT_POINT_GROUPS
Dictionary mapping each crystal system to its default (highest symmetry) point group.
from cdl_parser import DEFAULT_POINT_GROUPS
print(DEFAULT_POINT_GROUPS['cubic']) # 'm3m'
print(DEFAULT_POINT_GROUPS['trigonal']) # '-3m' NAMED_FORMS
Dictionary mapping form names to Miller indices (h, k, l).
from cdl_parser import NAMED_FORMS
print(NAMED_FORMS['octahedron']) # (1, 1, 1)
print(NAMED_FORMS['cube']) # (1, 0, 0)
print(NAMED_FORMS['dodecahedron']) # (1, 1, 0)
print(NAMED_FORMS['rhombohedron']) # (1, 0, 1) TWIN_LAWS
Set of recognised twin law names.
from cdl_parser import TWIN_LAWS
print(sorted(TWIN_LAWS))
# ['albite', 'baveno', 'brazil', 'carlsbad', 'dauphine', 'fluorite',
# 'gypsum_swallow', 'iron_cross', 'japan', 'manebach', 'pericline',
# 'spinel', 'spinel_law', 'staurolite_60', 'staurolite_90', 'trilling'] FEATURE_NAMES
Set of recognised feature annotation names (phantom, trigon, silk, colour, etc.).
PHENOMENON_TYPES
Set of recognised phenomenon types (asterism, chatoyancy, adularescence, etc.).
MODIFICATION_TYPES
Set of recognised modification types: elongate, truncate, taper, bevel, flatten.
Lexer/Parser Internals
For advanced use cases (syntax highlighting, custom processing):
from cdl_parser import Lexer, Token, TokenType
lexer = Lexer("cubic[m3m]:{111}")
tokens = lexer.tokenize()
for token in tokens:
print(f"{token.type.value}: {token.value}")
# SYSTEM: cubic
# LBRACKET: [
# POINT_GROUP: m3m
# RBRACKET: ]
# COLON: :
# LBRACE: {
# INTEGER: 1
# INTEGER: 1
# INTEGER: 1
# RBRACE: }
# EOF: None