Skip to content

entei_core package

entei_core

entei-core: MongoDB collection roots and columnar dict[str, list] materialization.

Exports :class:MongoRoot, :func:mongo_root_to_column_dict, and :func:materialize_root_data for building columnar data from PyMongo collections without a native extension stack.

MongoRoot dataclass

Carrier for a collection plus optional fixed column list for materialization.

Used with :func:~entei_core.mongo_root_to_column_dict to produce dict[str, list] with one list per top-level field.

Parameters:

Name Type Description Default
collection Any

A PyMongo :class:~pymongo.collection.Collection or compatible (e.g. mongomock).

required
fields tuple[str, ...] | None

Column order and membership for output. If None, field names are the union of top-level keys in all documents (sorted). If () (empty tuple), no columns are emitted even when documents exist. If non-empty, names must be unique. For an empty collection with fields is None, the result has no columns.

None

Raises:

Type Description
ValueError

If fields contains duplicate names.

Source code in packages/entei-core/src/entei_core/mongo_root.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
@dataclass(frozen=True, slots=True)
class MongoRoot:
    """Carrier for a collection plus optional fixed column list for materialization.

    Used with :func:`~entei_core.mongo_root_to_column_dict` to produce
    ``dict[str, list]`` with one list per top-level field.

    Parameters
    ----------
    collection:
        A PyMongo :class:`~pymongo.collection.Collection` or compatible (e.g. mongomock).
    fields:
        Column order and membership for output. If ``None``, field names are the
        union of top-level keys in all documents (sorted). If ``()`` (empty tuple),
        no columns are emitted even when documents exist. If non-empty, names must
        be unique. For an **empty** collection with ``fields is None``, the result
        has no columns.

    Raises
    ------
    ValueError
        If ``fields`` contains duplicate names.
    """

    collection: Any
    fields: tuple[str, ...] | None = None

    def __post_init__(self) -> None:
        """Validate ``fields`` invariants."""
        if self.fields is not None and len(self.fields) != len(set(self.fields)):
            raise ValueError("fields must not contain duplicate names")

__post_init__()

Validate fields invariants.

Source code in packages/entei-core/src/entei_core/mongo_root.py
36
37
38
39
def __post_init__(self) -> None:
    """Validate ``fields`` invariants."""
    if self.fields is not None and len(self.fields) != len(set(self.fields)):
        raise ValueError("fields must not contain duplicate names")

materialize_root_data(data)

Normalize pipeline data: columnarize :class:MongoRoot, else identity.

Parameters:

Name Type Description Default
data Any

Any value. If it is a :class:MongoRoot, returns the columnar dict from :func:mongo_root_to_column_dict; otherwise returns data unchanged.

required

Returns:

Type Description
Any

Columnar dict or the original data.

Source code in packages/entei-core/src/entei_core/_materialize.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def materialize_root_data(data: Any) -> Any:
    """Normalize pipeline data: columnarize :class:`MongoRoot`, else identity.

    Parameters
    ----------
    data:
        Any value. If it is a :class:`MongoRoot`, returns the columnar dict from
        :func:`mongo_root_to_column_dict`; otherwise returns ``data`` unchanged.

    Returns
    -------
    Any
        Columnar dict or the original ``data``.
    """
    if isinstance(data, MongoRoot):
        return mongo_root_to_column_dict(data)
    return data

mongo_root_to_column_dict(root)

Run find() on root.collection and build aligned column lists.

Reads the entire cursor into memory. Only top-level keys participate; nested documents are values in a single cell.

Parameters:

Name Type Description Default
root MongoRoot

Collection and optional fields (see :class:MongoRoot).

required

Returns:

Type Description
dict[str, list]

Keys are field names; each value is the column in document order.

Notes

When root.fields is None, keys are inferred from documents. When it is an empty tuple, returns {} for any document count.

Source code in packages/entei-core/src/entei_core/_materialize.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def mongo_root_to_column_dict(root: MongoRoot) -> dict[str, list[Any]]:
    """Run ``find()`` on ``root.collection`` and build aligned column lists.

    Reads the entire cursor into memory. Only top-level keys participate; nested
    documents are values in a single cell.

    Parameters
    ----------
    root:
        Collection and optional ``fields`` (see :class:`MongoRoot`).

    Returns
    -------
    dict[str, list]
        Keys are field names; each value is the column in document order.

    Notes
    -----
    When ``root.fields`` is ``None``, keys are inferred from documents. When it is
    an empty tuple, returns ``{}`` for any document count.
    """
    coll = root.collection
    cursor = coll.find()
    docs: list[dict[str, Any]] = list(cursor)
    if not docs:
        keys = list(root.fields) if root.fields is not None else []
        return {k: [] for k in keys}

    if root.fields is not None:
        keys = list(root.fields)
    else:
        key_set: set[str] = set()
        for d in docs:
            key_set.update(d.keys())
        keys = sorted(key_set)

    out: dict[str, list[Any]] = {k: [] for k in keys}
    for d in docs:
        for k in keys:
            out[k].append(d.get(k))
    return out