Talk:Tree (data structure)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

B+ trees[edit]

Problem. Are B+ trees are not trees? I don't buy the definition used in the page. 168.28.180.30 (talk) 14:13, 27 November 2017 (UTC)[reply]

I (almost) reverted your edits in the article, since I couldn't see at B+ tree why it shouldn't be subsumed under the definition given here. Could you please explain your above conclusion? - Jochen Burghardt (talk) 23:27, 27 November 2017 (UTC)[reply]

While it is true that no node can be ancestor to itself in a tree, that's also true in some directed acyclic graphs which are not trees, since there are multiple paths between pair of nodes but they don't form a directed cycle (Say a ->b, a->c, b->d, c->d). So this is a necessary but not sufficient condition fro a directed graph to be a tree and doesn't belong in the definition. It's true this article is about the data structure, not the mathematical entity, but I think the same holds, e.g. simple recursive trasversal algorithms would fail if there were multiple paths between nodes. — Preceding unsigned comment added by A1957 (talkcontribs) 18:53, 5 May 2022 (UTC)[reply]

Length of "Mathematical definition" section[edit]

This section is probably more math than is needed for or wanted by a general encyclopedia audience. It might be better presented in a math wikibook? -- Beland (talk) 16:06, 10 August 2020 (UTC)[reply]

Hmm, it's possible some material might be moved to Tree (graph theory), but there is also some material relevant to computer science, such as the different ways nodes of trees might actually be stored and connected. The CS parts could be summarized much more briefly, and without using theoretical mathematics or math symbols. -- Beland (talk) 20:50, 10 August 2020 (UTC)[reply]

The Mathematical definition section contains mathematical definitions of the most elementary structures that are relevant to the notion of tree in the context of computing. Unordered tree is the highest possible abstraction - just a single relation between nodes. Ordered tree is the most important refinemement - just add one more relation between nodes. These two families of structures are absolutely indispensable whenever we want to speak about mathematical definitions relevant to trees in computing. If also naming capabilities of trees are taken into account (like those in file systems or JSON data) then the structure defined in Using names and called nested dictionary is presumably the most important abstraction for the notion of labelled trees.

Therefore, in the mathematical aspect, the presented families of structures (Unordered tree, Ordered tree, Nested dictionary) provide the greatest possible simplification for the subject of trees and labelled trees in computing.

I can think of only one half-valid objection to the word "simplification" used here, namely that the section does not simplify the subject by only considering the finite case. Indeed, imposing the condition of finiteness to the structures (i.e. assuming that the set X of nodes, or the set of A of arrows is finite) would allow to use the notion of acyclicity instead of well-foundedness, to use natural numbers instead of ordinal numbers, as well as to omit the last condition about the sibling order (including the omission of the note that the condition can be omitted). Plus ACC for the child-to-parent relation could be replaced by the acyclicity condition, i.e. by the condition of being a DAG. This gain would not compensate for the loss, the main reason here being the fact that infinite trees have their place in computing.

A tree by Jeremy Gibbons

A tree as depicted on page 37 in Algebras for Tree Algorithms [1]
(see Tree accumulation)

There can also be arguments against the word "indispensable". The structures are indispensable as long as trees are based on the concept of a node - which is the case of the article. In some publications, trees are regared as "terms in some algebra", with no concept of a node in the definitions. However, when the underlying structures are diagrammatized, it turns out that they correspond to accessible pointed graphs in which nodes can share non-leaf children. [a] (This is a different situation than with "arrow trees" where only leaf nodes can be shared - thus there is a tree of arrows.) Apart from that, the mathematical apparatus used in such publications is less elementary than that one used in the article's section.

Since each subsection provides explanation for the "greatest possible simplification", removing any subsection would result in the whole section becoming less explanatory. At present, the only subsection that can be considered >90% excessive (redundant) is Definition using horizontal order. This section has been created mainly in response to the "citation needed" tag added in the 893343983-revision by User:Sderose.

Thus, IMHO, the main question is not about the length of the "Mathematical definition" section, but about the existence of the section. If you think that mathematical definitions have no place in the article (or even in any CS article) then be aware that the article has been listed as one of the 57 vital CS articles in Mathematics. You can also have a look at some posts on this talk page (Is tree a tree?, Are children trees?, Undefined definition) to get an idea about the possible consequences of "demathematizing" this article.

Hundblue (talk) 15:11, 11 August 2020 (UTC)[reply]

I agree that the section is valuable. While the standard inductive definition ("t: v [t[1], ..., t[k]]") is sufficient for many computer science applications, different mathematical definitions (as explained in the section) are used e.g. in term rewriting, in considerations about an algorithm's search space (in particular in automated theorem proving), and -I guess- in graph theory. I suggest to apply the policy WP:CONSPLIT here; our situation appears similar to the example thermal energy vs. heat. I'm not sure about Beland's suggestion to merge the section into tree (graph theory); maybe a graph theorist can comment on that. (I'll be offline for 2 weeks from now on.) - Jochen Burghardt (talk) 17:31, 11 August 2020 (UTC)[reply]
@Hundblue: I don't think it's of much interest to a general audience to see ordered and unordered trees formally defined eight different ways each. It would probably be sufficient to give one definition of each, mentioning the other seven ways and referring readers to mathematics specialty texts if they want details. -- Beland (talk) 18:12, 22 August 2020 (UTC)[reply]

I don't quite understand the term of interest to a general audience. How does such a term apply to e.g. the mathematical articles refered to by the section text (partial order, transitive closure, linear extension, hypergraph, ...)? The "Mathematical definition" section is aimed at those readers who want to see how the notion is described mathematically. As for whether or not the amount of provided definitions is appropriate, the main criteria that are in favor of not reducing the present amount are:

  1. Importance of the article's subject. (The subject is all of Tree (data structure), Tree (computer science), Tree (computing) – everything redirects to a single page.)
  2. Applicability of particular definitions.

I think the subject is of such a fundamental importance to computer science / computing, that multitudes of mathematical descriptions are appropriate. Let us make an overview of the presented definitions (in the (t~...) definitions below, "unordered tree" is often abbreviated to "tree").

  1. Tree as an algebra (X, parent).
    The basic algebraic definition. Used as the introductory definition because it can be expressed in simple prose.
  2. Tree as a partial algebra (X, parent).
    The basic definition that makes use of a function symbol (parent). Obtained from the algebraic definition by removing the root loop.
  3. Tree as a relational structure (X, ≺).
    A candidate for the "canonical" definition with distilled conditions that form a basis for generalizations.
  4. Tree as a partial order (X, ≤).
    The order-theoretic definition exposing unordered trees as a subclass of set-theoretic trees.
  5. Tree as a set system (ℱ, ⊆) ordered by inclusion.
    Set system representation of trees. This definition exposes the semantics of containment hierarchy.
  6. Well-founded tree as a partial order defined recursively.
    Basic recursive construction of trees, with the correspondent stratification presented.
  7. Well-founded tree as a forest-node pair (F, x) defined recursively.
    The most important (w.r.t. computing) recursive construction of trees in which nodes are (still) viewed as abstract entities.
  8. Tree as a multidigraph (X, A, s, t), and the "arrow tree" generalization.
    The multidigraph definition of trees, showing a direct correspondence between reified arrows and hard-links in file systems.
  9. Tree as an unfolded accessible pointed graph / multidigraph.
    Trees as a special case of accessible pointed graphs / multidigraphs. The operation of "unfolding" is presented.
  10. Trees in naming systems (X, Σ, A, s, σ, t).
    The most abstract definition of naming structures from which unordered trees can be extracted. Connections to labelled transition systems, file systems, nested dictionaries as well as JSON data are provided.
  1. Ordered tree as a structure (X, ≤V, ≤S) with two partial orders (vertical + sibling).
    Order-theoretic definition of an ordered tree as an expansion of unordered tree. The hierarchical ordering is equipped by ordering of siblings.
  2. Ordered tree as a structure (X, ≤V, ≤H) with two partial orders (vertical + horizontal).
    Order-theoretic definition, a variant of the above, used in cited sources.
  3. Ordered tree as a partial algebra (X, lc, rs) (left-child + right-sibling).
    A definition with two function symbols, using the correspondence between binary trees and ordered trees.
  4. Ordered tree as a structure (X, ..., ι) with ι being the "sibling-index" function that assigns each (non-root) node x the number of x's preceding siblings.
    This is not (yet) presented as a definition. Instead, it is said that an ordered tree can be "encoded" by sequences of sibling indices.

The only definition that is apparently excessive is (o~2). It is too similar to (o~1) and should perhaps be dissolved into notes about (o~1) so that links to the sources are preserved.

The other candidates for reduction which I can think of are (t~6) and (o~3). The (t~6) is just an inductive construction of (X, ≤) with explicit presentation of induction steps. This serves as a basis for the more important (t~7) where there are no induction steps. Instead, a reference is made to the presentation in (t~6). Thus, removing (t~6) would require some additions to (t~7) so that the accuracy is preserved.

The (o~3) definition (X, lc, rs) of ordered trees can possibly be made "less excessive" by merging it into the previous subsection (Generating structure) and perhaps removing the prescription about how (X, ≤V, ≤S) is recovered.

Making other reductions in the above definitions would result, IMHO, in the "Mathematical definition" section becoming less valuable, that is, less valuable to readers that are interested in mathematical description of the subject.

Hundblue (talk) 14:11, 24 August 2020 (UTC)[reply]

References

  1. ^ Gibbons, Jeremy (1991). Algebras for Tree Algorithms (PDF) (Ph.D.). Oxford University.
  1. ^ Sadly, the diagram in the thesis does not demonstrate this.
@Hundblue: The other articles you cite actually seem to cover their subjects in a brief manner, more appropriate to a general audience. If the section in question was as concise as those articles, I would probably not have been bothered about its length. If the section was as brief as the above list and linked to other articles that explain the relationship, that would also be a reasonable length.
General audience readers include everyone from secondary school programming students who are learning what a tree is for the first time, to non-computer people who randomly heard the term and want to know what it means, to professional programmers who are refreshing their memories about some properties of trees, or using this article as a way to navigate to related topics, like a specific type of tree they want to consider implementing. Some readers will also be PhD mathematicians or theoretical computer scientists. These are the specialty readers for which this article tries to avoid or at least explain the jargon they see in journal articles, and for which it at most provides an overview of concepts with pointers to more detailed works. It should not resemble a chapter in a textbook from this specialty, which is why I suggest specialty content of this length is more appropriate for a wikibook. Yes, such a chapter would be valuable to specialty readers interested in that content, but that doesn't mean it should be found in an encyclopedia. I mean, I took Mathematics for Computer Science at MIT, and this is way, way more theoretic math than we were ever exposed to on this topic or would be expected to ever be interested in reading about should we have a tree-related problem.
Linked list is also level-5 vital article under theoretical computer science. Note that it is B-class, which means it has been lightly audited and is considered to be of higher quality. It has zero sections devoted to mathematical definitions. User:Basbodart has also said that the mathematical definitions do not belong in this article; this is a long-standing complaint which needs to be addressed. The three talk page sections you highlight can be answered succinctly with by explaining in the article the relationship of this type of tree to Tree (graph theory). Apparently that is actually someone different. Given that is apparently not an appropriate place to offload the math definitions from this article, I will attempt to pull out the programming-related examples and the context necessary to understand them, and cut the rest. -- Beland (talk) 09:08, 2 April 2022 (UTC)[reply]

It seems to me that you brought up the question of existence of the Mathematical definition section. This question can be dissolved into the following fundamental questions:

  1. Does the CS concept of a tree have a mathematical definition?
  2. Should the CS concept of a tree be mathematically defined in the Tree (data structure) / Tree (computer science) / Tree (computing) article?

Of course, the (B) question only has sense if the (A) question is answered positively.

As a supporting argument for the positive answer to the (B) question, I mentioned (in August 2020) the fact that the article has been listed as one of the vital CS articles in Mathematics. Now you pointed out Linked list as an example of another article from the same list. This article contains no mathematical definition of the subject (i.e. of linked lists). Since the Linked list article is a B-class article (thus of higher quality rating than Tree (data structure) which is C-class) one can conclude that being classified as a "vital CS article in Mathematics" is not a sufficient reason for the presence of a mathematical definition of the subject. In particular, there is no place for (or no reason to place) the "Mathematical definition" section in the Tree (data structure) article. Is this your reasoning?

Let us formulate the above as general conditions:

  1. An article A is a vital CS article in Mathematics.
  2. An article A has no place for a mathematical definition of the subject, even if such a definition is available.

To me, the above conditions (1)(2) are contradictory. If Linked list is a vital CS article in Mathematics then the lack of a mathematical definition is just a deficiency of the article. Maybe this deficiency is caused by an unclear distinction between Linked list and List (abstract data type). Interestingly, the latter article does have an Abstract definition section but, simultaneously, without the page being listed as a vital CS article in math. Or maybe one can ask if there is a mathematical definition of linked lists at all, especially such one that distinguishes linked lists from Array data structures. In this case it is questionable, why Linked list, Array data structure (...) are classified as vital CS articles in Mathematics.

Similarly, the opinion of User:Basbodart, namely that mathematical definitions should not appear in a CS article), is contradictory to the mere existence of the vital CS articles in Mathematics category.

If the (B) question about existence is answered to the positive, then there is a subsequent

  1. What is the appropriate size of the Mathematical definition section in the Tree (data structure) / Tree (computer science) / Tree (computing) article?

My answer to this is that the size should be proportional to the importance of the subject. Since the importance is crucial, it is OK if the Mathematical definition section is is very large. This is what I already have expressed in August 2020. All information in the Mathematical definition section is focused on the most possible simplification of the subject. The definitions of unordered trees, ordered trees, edge-labelled trees are most elementary possible, provided that the case of infinite trees is taken in consideration.

There are some parts of the text that can be considered excessive and thus can be reduced - see my notes from August 2020. To repeat:

Making other reductions [...] would result [...] in the [...] section becoming less valuable, that is, less valuable to readers that are interested in mathematical description of the subject.

I am afraid that this will also apply to your attempt to pull out the programming-related examples and the context necessary to understand them, and cut the rest. This will likely cause more damage than gain.

The concept of general audience readers has also already been discussed in August 2020. To repeat: The "Mathematical definition" section is aimed at those readers who want to see how the notion is described mathematically. If general audience readers don't want to see mathematical definitions then they just can skip the section. I am strongly opposed to a solution in which non-mathematical descriptions are presented as mathematical definitions. There are plenty such cases in the literature and most of them just provide a flawed definition. Even the current wording of the introductory paragraph would be such a case, but fortunately, there is no claim that this is a "definition".

Hundblue (talk) 17:43, 3 April 2022 (UTC)[reply]

To reply briefly:
  • There should be a precise definition of the tree data structure in this article, but for accessibility to a broader audience, that is often better done in words than in symbols. I think the definition now in the introduction is sufficient, but improvements are welcome. And yes, the first paragraph does give a definition of the tree data structure. This is required for the first paragraph of every article by MOS:OPEN.
  • There are several ways to axiomatically construct a definition for trees. All of those definitions are going to be in some sense redundant to the definition already given in words. Perhaps one or more axiomatic definitions are "crucial" for doing mathematical proofs or other theoretical computer science work, but I don't see how they would be "crucial" for explaining to readers what trees are and their properties and uses - and that's the purpose of this article. I think going through the details of all of them, and really any of them, is too much detail for this article. If you feel these definitions are important to include, I would recommend mentioning them briefly and leaving the details to cited sources, limiting the discussion of mathematical definitions to a paragraph or so.
  • Other editors seem to have decided that the lack of an axiomatic definition is not needed for the Linked list article to be considered reasonably complete, and I agree with that conclusion. If this means you think the article isn't "mathematical" enough to be included on Wikipedia:Vital articles/Level/5/Mathematics, then perhaps the theoretical computer science section should be moved to Wikipedia:Vital_articles/Level/5/Technology with the other computer-related articles.
-- Beland (talk) 18:24, 3 April 2022 (UTC)[reply]

Unordered tree section[edit]

I just dropped the "Unordered tree" subsection of "Mathematical definition". A lot of material there does not in fact appear to be defining the tree data structure, but relating trees to other mathematical structures, which is wandering off topic. That sort of thing might be more relevant to Tree (graph theory) or to the other mentioned mathematical structures. There were also some mention of how trees are used in specific applications. It's weird to have examples of applications mixed in with what is supposed to be a section on definitions, so I moved those mentions to the "Applications" section.

In addition to the problems of length and relevance, the material I removed was almost entirely unreferenced. (There were a few references that support one line each out of a long section, which it wouldn't make sense to leave in on their own without context.) Wikipedia:Verifiability requires that material "any material whose verifiability is challenged or likely to be challenged" have inline citations to reliable sources. Given the perils of relying on editors to verify abstract mathematical concepts on their own, I'm hereby requesting those citations before any material is re-added. -- Beland (talk) 18:38, 3 April 2022 (UTC)[reply]

Nested data section[edit]

I also dropped this section per WP:V, because it is almost completely unreferenced. It also seems to be introducing novel concepts that are not part of the process of defining trees. The only references are to a model of JSON and MongoDB documents as trees. This is more appropriate for the "Applications" section, where JSON is already mentioned. I added a mention of MongoDB on JSON#Uses for completeness. -- Beland (talk) 20:10, 3 April 2022 (UTC)[reply]

You apparently have suddenly got an idea that the section could have problems regarding Wikipedia:Verifiability. You never did mention such an issue before, neither in August 2020, nor now in April 2022. And you solved the newly discovered problems by removing the subsections, 2 of 3 at the time of writing this text. Now there is no point in preserving the remaining Ordered tree subsection.
This is vandalism. I will not assist in your activities. Hundblue (talk) 20:33, 3 April 2022 (UTC)[reply]
I just removed the "Ordered tree" section for the reasons described below. Yes, I just noticed the problems with lack of sourcing because I read through these long passages in detail for the first time, for the purpose of deciding what material to keep. It sounds like you are unhappy to see this material removed, especially as I'm sure it took a long time to write. I'm sorry to have to trim so much in front of the eyes of the author. As I suggested before, there might still be a place to publish this material in full in the form of a wikibook chapter, if you think that's worthwhile. But please don't equate trying to uphold Wikipedia policies and improve the quality of articles with mindless or malicious deletion of desirable material just for the fun of making other people angry. -- Beland (talk) 20:45, 3 April 2022 (UTC)[reply]

Ordered tree section[edit]

I dropped this section as well. It had similar problems in that most sections introduced possibly original concepts not supported by citations and not part of the definition of a tree. In some cases there were citations that failed verification of the claims they were supposed to support. Some of the material here had more to do with tree traversal or XPaths than defining trees, and is out of scope of this article.

The first two sections (the intro to "Ordered tree" and "Definition using horizontal order") did have plausible references (though there are some opinions that fail verification and violate WP:NPOV, and probably not all the other material in these sections is actually supported by these sources) and were reasonably related to the purpose of giving a mathematical definition. I dropped these sections because they appear to be defining the concept in graph theory, and so would be more appropriate at Tree (graph theory)#Ordered tree. I did not move them there because 1.) it seems like too much detail for that article, and 2.) there are cross-references to other (deleted) sections in this article which would need to be cleaned up. If you want to restore a cleaned-up version of this material on that article, my advice would be to make it a lot shorter, possibly just citing the referenced sources and noting that they give formal definitions, so anyone who is interested in that can just click through to those articles. -- Beland (talk) 20:38, 3 April 2022 (UTC)[reply]

Reflection of separateness[edit]

"Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree,..." OR (A concretely defined tree does not require separation to be ordered) "with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a set of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance). For example, looking at a tree as a whole, one can talk about "the parent node" of a given node, but in general, as a data structure, a given node only contains the list of its children but does not contain a reference to its parent (if any)." ADDING You hide the root NODE and declare the parent node the same as "the child node". I maintain, this is bad form, let's start with computationally... And I have but momentarily tempered my accusing for the sake of one... — Preceding unsigned comment added by 156.99.40.14 (talk) 18:30, 27 November 2021 (UTC)[reply]

I've removed this content from the article. -- Beland (talk) 09:09, 2 April 2022 (UTC)[reply]

Move discussion in progress[edit]

There is a move discussion in progress on Talk:Associative array which affects this page. Please participate on that page and not in this talk page section. Thank you. —RMCD bot 00:32, 27 January 2022 (UTC)[reply]

Dupplicate of m-ary tree[edit]

Merge this page with https://en.wikipedia.org/wiki/M-ary_tree ? Or at least state what the difference is more clearely. — Preceding unsigned comment added by Agowa (talkcontribs) 08:09, 10 May 2022 (UTC)[reply]