Comment Trees#

Tree node data structures are used to model submission comment trees.

A general tree node is defined as a data class with a value and children field. The type of value and members of the children list are both generic. Typically though, the children field will always be some list of other tree nodes.

Submission comment tree nodes extend the general tree node data structure and includes a new field: more. This object, if not None, is a ‘MoreComments’ callable. Calling it makes a network request and returns another tree node whose children elements should be treated as a continuation of the original children list. The value field of this new node will be None.

For most nodes in a submission comment tree, the value field will be a Comment object, however it will be a Submission for the root node, and it will be None for nodes that come from evaluating MoreComments callables.

A traversal algorithm is necessary for navigating through a tree structure. RedditWarp doesn’t include any built-in tree traversal methods since there’s no one-size-fits-all approach that covers every need, such as stopping at a certain depth, limiting the number of MoreComments fetches, processing only leaf node comments, and handling network errors when evaluating MoreComments callables.

A submission’s comment tree can be obtained via client.p.comment_tree.fetch().

#!/usr/bin/env python
from __future__ import annotations
from typing import TYPE_CHECKING, Iterator
if TYPE_CHECKING:
    # from typing import MutableSequence, Callable
    from redditwarp.models.comment_tree_SYNC import CommentSubtreeTreeNode
    # from redditwarp.models.comment_tree_SYNC import MoreCommentsTreeNode

# from collections import deque

import redditwarp.SYNC
from redditwarp.models.comment_SYNC import Comment

def traversal(node: CommentSubtreeTreeNode[object]) -> Iterator[tuple[int, Comment]]:
    def traverse(root: CommentSubtreeTreeNode[object], level: int = 0) -> Iterator[tuple[int, Comment]]:
        value = root.value
        if isinstance(value, Comment):
            yield (level, value)

        for child in root.children:
            yield from traverse(child, level + 1)

        if root.more:
            yield from traverse(root.more(), level)

    return traverse(node)

client = redditwarp.SYNC.Client()

tree_node = client.p.comment_tree.fetch(1537771841)
for depth, c in traversal(tree_node):
    print(f"{depth*'.'} u/{c.author_display_name} | {c.body!r}"[:80])
#!/usr/bin/env python
from __future__ import annotations
from typing import TYPE_CHECKING, AsyncIterator
if TYPE_CHECKING:
    # from typing import MutableSequence, Callable, Awaitable
    from redditwarp.models.comment_tree_ASYNC import CommentSubtreeTreeNode
    # from redditwarp.models.comment_tree_ASYNC import MoreCommentsTreeNode

import asyncio
# from collections import deque

import redditwarp.ASYNC
from redditwarp.models.comment_ASYNC import Comment

def traversal(node: CommentSubtreeTreeNode[object]) -> AsyncIterator[tuple[int, Comment]]:
    async def traverse(root: CommentSubtreeTreeNode[object], level: int = 0) -> AsyncIterator[tuple[int, Comment]]:
        value = root.value
        if isinstance(value, Comment):
            yield (level, value)

        for child in root.children:
            async for i in traverse(child, level + 1):
                yield i

        if root.more:
            async for i in traverse(await root.more(), level):
                yield i

    return traverse(node)

async def main() -> None:
    client = redditwarp.ASYNC.Client()

    tree_node = await client.p.comment_tree.fetch(1537771841)
    async for depth, c in traversal(tree_node):
        print(f"{depth*'.'} u/{c.author_display_name} | {c.body!r}"[:80])

asyncio.run(main())

In the preceding code demo, if we ignore the if root.more: part it should look familar as your conventional recursive depth-first search (DFS) traversal algorithm. If we did remove the if root.more: block it would behave as though only the comments immediately visible on the page are processed without clicking on any ‘load more comments’ links.

A DFS traversal of a submission comment tree produces comments in the order in which they appear when reading the comments from top to bottom on the page. A BFS, on the other hand, would yield comments in the order of their level, starting with all top level nodes first, then all second level nodes, then all third level nodes, and so on.

The recursive DFS algorithm works well, is short, and looks very clean. A big drawback though is that Python restricts the call stack size to 1000 levels, making it potentially fragile, even if it’s unlikely you’ll encounter comment trees 1000 levels deep. For this reason, an iterative version may be favoured, however, there are hurdles in creating an iterative version of the algorithm that looks as tidy.

If you attempt to convert the recursive DFS algorithm into an iterative one, you may get an inaccurate algorithm where the order in which the MoreComments callables are evaluated is incorrect. In the iterative version, the MoreComments callables are evaluated as soon as the nodes are encountered, whereas in the recursive version, the MoreComments callable of the innermost nodes are evaluated first, which should be the correct order of evaluation. In the end, it’s technically a non-functional difference, but it does have a noticeable effect on scripts that display comment trees. Namely, the output of an incorrect iterative version will appear to be more jittery and slower than a recursive one.

RedditWarp includes a demonstrational command line tool to visualise Reddit submission comment trees. Try running these two on the command line. The --algo=dfs0 option selects the recursive algorithm and --algo=dfs1 is the iterative one. Do you notice any difference in the pace of output?

python -m redditwarp.cli.comment_tree --algo=dfs0 t44sm0

The longer, ‘correct’, iterative DFS and BFS traversal algorithms are recommended. These are significantly uglier than the shorter iterative versions, but they are more accurate to the correct order in which the MoreComments callables should be evaluated.

The traversal algorithm recipes are illustrated below.

Traversal recipes#

Depth-first, recursive#

(dfs0)

def depth_first_recursive(node: CommentSubtreeTreeNode[object]) -> Iterator[tuple[int, Comment]]:
    def traverse(root: CommentSubtreeTreeNode[object], level: int = 0) -> Iterator[tuple[int, Comment]]:
        value = root.value
        if isinstance(value, Comment):
            yield (level, value)

        for child in root.children:
            yield from traverse(child, level + 1)

        if root.more:
            yield from traverse(root.more(), level)

    return traverse(node)
def depth_first_recursive(node: CommentSubtreeTreeNode[object]) -> AsyncIterator[tuple[int, Comment]]:
    async def traverse(root: CommentSubtreeTreeNode[object], level: int = 0) -> AsyncIterator[tuple[int, Comment]]:
        value = root.value
        if isinstance(value, Comment):
            yield (level, value)

        for child in root.children:
            async for i in traverse(child, level + 1):
                yield i

        if root.more:
            async for i in traverse(await root.more(), level):
                yield i

    return traverse(node)

A recursive DFS. The algorithm looks clean and performs well, except that Python limits the maximum function recursion depth to 1000, so if you have a very deep comment tree you’ll have to use an iterative approach.

Depth-first, iterative, inaccurate#

(dfs1)

def depth_first_iterative_inaccurate(node: CommentSubtreeTreeNode[object]) -> Iterator[tuple[int, Comment]]:
    stack: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    levels = deque([0])
    while stack:
        node = stack.pop()
        level = levels.pop()

        value = node.value
        if isinstance(value, Comment):
            yield (level, value)

        if node.more:
            stack.append(node.more())
            levels.append(level)

        stack.extend(reversed(node.children))
        levels.extend([level + 1] * len(node.children))
async def depth_first_iterative_inaccurate(node: CommentSubtreeTreeNode[object]) -> AsyncIterator[tuple[int, Comment]]:
    stack: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    levels = deque([0])
    while stack:
        node = stack.pop()
        level = levels.pop()

        value = node.value
        if isinstance(value, Comment):
            yield (level, value)

        if node.more:
            stack.append(await node.more())
            levels.append(level)

        stack.extend(reversed(node.children))
        levels.extend([level + 1] * len(node.children))

An iterative DFS that is functionally equivalent to the recursive version but is slightly inaccurate because the MoreComments callables are evaluated before the child nodes are processed. It’s undesirable because for a display script it has the effect of feeling slower and looking more jittery.

Depth-first, iterative, accurate#

(dfs2)

def depth_first_iterative_accurate(node: CommentSubtreeTreeNode[object]) -> Iterator[tuple[int, Comment]]:
    stack: MutableSequence[bool] = deque([True])
    node_stack: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    more_stack: MutableSequence[Callable[[], MoreCommentsTreeNode]] = deque()
    levels = deque([0])
    while stack:
        if stack.pop():
            node = node_stack.pop()
        else:
            node = more_stack.pop()()
        level = levels.pop()

        value = node.value
        if isinstance(value, Comment):
            yield (level, value)

        if node.more:
            stack.append(False)
            more_stack.append(node.more)
            levels.append(level)

        stack.extend([True] * len(node.children))
        node_stack.extend(reversed(node.children))
        levels.extend([level + 1] * len(node.children))
async def depth_first_iterative_accurate(node: CommentSubtreeTreeNode[object]) -> AsyncIterator[tuple[int, Comment]]:
    stack: MutableSequence[bool] = deque([True])
    node_stack: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    more_stack: MutableSequence[Callable[[], Awaitable[MoreCommentsTreeNode]]] = deque()
    levels = deque([0])
    while stack:
        if stack.pop():
            node = node_stack.pop()
        else:
            node = await more_stack.pop()()
        level = levels.pop()

        value = node.value
        if isinstance(value, Comment):
            yield (level, value)

        if node.more:
            stack.append(False)
            more_stack.append(node.more)
            levels.append(level)

        stack.extend([True] * len(node.children))
        node_stack.extend(reversed(node.children))
        levels.extend([level + 1] * len(node.children))

This version is more algorithmically accurate to the recursive one but at the cost of looking messier.

Breadth-first, inaccurate#

(bfs1)

def breadth_first_inaccurate(node: CommentSubtreeTreeNode[object]) -> Iterator[tuple[int, Comment]]:
    level = 0
    queue: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    while queue:
        batch = deque(queue)
        queue.clear()
        while batch:
            node = batch.popleft()
            if node.value is None:
                if node.more:
                    batch.appendleft(node.more())
                batch.extendleft(reversed(node.children))
                continue

            value = node.value
            if isinstance(value, Comment):
                yield (level, value)

            queue.extend(node.children)
            if node.more:
                queue.append(node.more())

        level += 1
async def breadth_first_inaccurate(node: CommentSubtreeTreeNode[object]) -> AsyncIterator[tuple[int, Comment]]:
    level = 0
    queue: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    while queue:
        batch = deque(queue)
        queue.clear()
        while batch:
            node = batch.popleft()
            if node.value is None:
                if node.more:
                    batch.appendleft(await node.more())
                batch.extendleft(reversed(node.children))
                continue

            value = node.value
            if isinstance(value, Comment):
                yield (level, value)

            queue.extend(node.children)
            if node.more:
                queue.append(await node.more())

        level += 1

This BFS traversal evaluates the MoreComments callables before processing the children which, we’ve established is undesirable because it feels slow.

Breadth-first, accurate#

(bfs2)

def breadth_first_accurate(node: CommentSubtreeTreeNode[object]) -> Iterator[tuple[int, Comment]]:
    level = 0
    queue: MutableSequence[bool] = deque([True])
    node_queue: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    more_queue: MutableSequence[Callable[[], MoreCommentsTreeNode]] = deque()
    while queue:
        batch = deque(queue)
        node_batch = deque(node_queue)
        more_batch = deque(more_queue)
        queue.clear()
        node_queue.clear()
        more_queue.clear()
        while batch:
            if batch.popleft():
                node = node_batch.popleft()
            else:
                node = more_batch.popleft()()

            if node.value is None:
                if node.more:
                    batch.appendleft(False)
                    more_batch.appendleft(node.more)
                batch.extendleft([True] * len(node.children))
                node_batch.extendleft(reversed(node.children))
                continue

            value = node.value
            if isinstance(value, Comment):
                yield (level, value)

            queue.extend([True] * len(node.children))
            node_queue.extend(node.children)
            if node.more:
                queue.append(False)
                more_queue.append(node.more)

        level += 1
async def breadth_first_accurate(node: CommentSubtreeTreeNode[object]) -> AsyncIterator[tuple[int, Comment]]:
    level = 0
    queue: MutableSequence[bool] = deque([True])
    node_queue: MutableSequence[CommentSubtreeTreeNode[object]] = deque([node])
    more_queue: MutableSequence[Callable[[], Awaitable[MoreCommentsTreeNode]]] = deque()
    while queue:
        batch = deque(queue)
        node_batch = deque(node_queue)
        more_batch = deque(more_queue)
        queue.clear()
        node_queue.clear()
        more_queue.clear()
        while batch:
            if batch.popleft():
                node = node_batch.popleft()
            else:
                node = await more_batch.popleft()()

            if node.value is None:
                if node.more:
                    batch.appendleft(False)
                    more_batch.appendleft(node.more)
                batch.extendleft([True] * len(node.children))
                node_batch.extendleft(reversed(node.children))
                continue

            value = node.value
            if isinstance(value, Comment):
                yield (level, value)

            queue.extend([True] * len(node.children))
            node_queue.extend(node.children)
            if node.more:
                queue.append(False)
                more_queue.append(node.more)

        level += 1

A BFS that processes all children before evaluating MoreComments callables. Algorithmically better but programmatically much uglier.