Talk:Markov random field

Needs much more work[edit]

Needs much more work - discussion of inference, the Hammersley-Clifford Theorem, links, etc. Will add more. Revise away. -- Soultaco, 22:50 24 Dec 2004 (UTC)

Could someone please elaborate on this sentence in the intro?: "It can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies)." —Preceding unsigned comment added by 18.243.2.30 (talk) 09:59, 2 December 2007 (UTC)[reply]

Potential functions[edit]

From the current version of the article:

What's the relationship between the potential function and the Gibbs function?

a set of potential functions $\Phi =\phi _{c_{i}}:A_{c_{i}}\rightarrow R^{+}$ , where each $c_{i}=\lbrace i_{1},i_{2},...i_{|c_{i}|}\rbrace$ represents a clique in $G$ , and $A_{c_{i}}$ is the set of all possible assignments to the random variables represented by $v_{i_{1}},v_{i_{2}},...v_{i_{|c_{i}|}}$ . In other words, each clique has associated with it a function from possible assignments to all nodes to the nonnegative real numbers.

I find this needlessly complex and somewhat confusing. Here is how I would word it:

a set $\Phi$ of potential functions $\phi _{c_{i}}:A_{c_{i}}\rightarrow R^{+}$ , where each $c_{i}$ is a clique in $G$ , and $A_{c_{i}}$ is the set of all possible assignments to the elements of $c_{i}$ . In other words, each clique has associated with it a function from assignments (to each element of the clique) to nonnegative real numbers.

It's not clear to me why the definition needs to mention both i's and v's, or why such schematic letters are even necessary for the definition at all. (Can't we just identify the vertices of the graph with random variables rather than invoking some "representation" relation between them?) However, I'm hesitant to just unilaterally edit the article, as I'm not very knowledgeable about these things. Thoughts? Dbtfz (talk - contribs) 02:21, 4 February 2006 (UTC)[reply]

I went ahead and revised the passage mentioned above as well as much of the rest of the article. Anyone who is interested, please review and revise as needed. Dbtfz (talk - contribs) 06:02, 10 February 2006 (UTC)[reply]

I am not convinced of the basis and need for the requirement of nonnegative real numbers. Is there a reason (ideally, with a reference) for such a restriction, or could the restriction not rather be omitted? See for example the antiferromagnetic Ising model (an Ising model being considered as a special case of a Markov network). --Chris Howard (talk) 16:46, 19:55, 4 May 2008 (UTC)[reply]

Not sure, but I think its a notational problem: if we write

f_{k}=\exp(\phi _{k})

then the f_k are non-negative, and the phi_k are merely real, positive or negative. Some texts seem to call the f_k the "potential functions", while others call the phi_k the potential functions; thus the source of confusion?! linas (talk) 18:21, 28 August 2008 (UTC)[reply]

Thanks, linas, it´s fine now after this change you made. --Chris Howard (talk) 23:55, 30 December 2008 (UTC)[reply]

Equivalence of Markov properties[edit]

The three Markov properties are not in general equivalent. Equivalence hold however for distributions, which are positive everywhere, see for instance Lauritzen, `Graphical models' or Koller, `Probabilistic Graphical models'. —Preceding unsigned comment added by Derfugu (talk • contribs) 15:50, 17 November 2010 (UTC)[reply]

MRF vs. Bayesian Network[edit]

A Markov network is similar to a Bayesian network in its representation of dependencies, but a Markov network can represent dependencies that a Bayesian network cannot, such as cyclic dependencies.

Shouldn't it be such as cyclic independencies ? Also, this sentence gives the impression that MRFs are a generalization of BNs, when BNs can actually represent independencies that a MRF cannot [1, p. 393 -- chapter available online]

[1] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. -- 139.165.16.117 at 11:28, 17 July 2007

I've fixed the latter problem. Took 15:00, 7 August 2007 (UTC)[reply]

Hold up there; it's true that BN can represent conditional INdependencies that a Bayesian network cannot, but any Bayesian network can be converted into a Markov network representing the same dependencies (as well as new ones). --12.17.136.181 00:40, 11 August 2007 (UTC)[reply]

A MRF can represent any joint PDF and thus represent any (in)dependencies provided you give the MRF the right clique structure, so BNs are a subset of MRFs. However, unlike an MRF, a BN is an explicit product of (directed and acyclic) conditional probabilities times a prior, so independence constraints can easily be “hardwired” into a BN (e.g. by using a factorised prior). It is generally more difficult to achieve (and to preserve under adaptive training) such constraints in an (undirected and cyclic) MRF. — Preceding unsigned comment added by Teleprinter Sleuth (talk • contribs) 17:50, 17 November 2011 (UTC)[reply]

Log-linear model[edit]

Asterion85 (talk) 22:18, 4 May 2008 (UTC)[reply]

In practice, a Markov network is often conveniently expressed as a log-linear model, given by

$P(X=x)={\frac {1}{Z}}\exp(\sum _{k}w_{k}\phi _{k}(x_{\{k\}}))$

it is not correct

absuming that:

$\phi _{k}=\exp(w_{k}*f_{k}(x_{\{k\}}))$

it should be:

$P(X=x)={\frac {1}{Z}}\exp(\sum _{k}w_{k}f_{k}(x_{\{k\}}))$

For references http://www.cs.washington.edu/homes/pedrod/kbmn.pdf

I made a clarifying edit to the page; however, I reversed f and phi, since I think the more common usage is to use phi to denote the field, and use f for the distribution. linas (talk) 18:36, 28 August 2008 (UTC)[reply]

undefined[edit]

There are many undefined variables in this equation.

P(X=x)={\frac {1}{Z}}\exp \left(\sum _{k}w_{k}^{\top }\phi _{k}(x_{\{k\}})\right)

Could someone specify what $'P'$ and $'w_{k}'$ represent in this equation?? JEIhrig 18:46, 12 April 2009 (EST)

P(X=x) means "the probability that the random variable X takes a certain value x (here X is a vector: all values in the network). But yes, what the heck is w, and what do the k indices loop over? — Preceding unsigned comment added by 119.75.27.50 (talk) 07:53, 12 July 2013 (UTC)[reply]

Markov network: definition or theorem?[edit]

I am confused by this article. The term Markov network is not referenced, can a reference be supplied please? Other terms such as Markov random field are in widespread use. Also, a MRF is surely a probability function that satisfies one of the Markov properties. So then it is a THEOREM not a DEFINITION that it can be factorized in terms of cliques.

I propose to edit the article along the above lines. Objections and clarification invited? Have I got it wrong? Megadeff (talk) 18:46, 29 May 2009 (UTC)[reply]

The factorization definition is usually used as it is stronger than the Markov property definition: i.e. it is possible to construct a graph which satisfies the Markov properties but doesn't factorise (an example on a 4-cycle is given here). Under various conditions (positivity of the density or decomposability of the graph) the definitions are equivalent. —3mta3 (talk) 10:50, 30 May 2009 (UTC)[reply]

As for the name, personally I would call it an undirected graphical model.—3mta3 (talk) 11:13, 30 May 2009 (UTC)[reply]

Thanks. Physicists call things satisfying the clique product formula a 'Gibbs state' or 'Gibbs random field'. Such use has been around a very long time. Maybe in statistics the product formula has taken over, and if that is what they want...fine. But could we keep the word 'Markov' to mean just that? Maybe a better solution is to develop the page on MRF, and let the statistical applications reign here. Megadeff (talk) 08:41, 1 June 2009 (UTC)[reply]

I think the two terms are related enough that it makes sense to keep them on the same article for now, though I would agree to renaming it to MRF, as that seems to be the more commonly used term. As for the definition, we could probably mention them both, my impression is that the differences only arise in pathological cases, so it seems like most people don't really differentiate between the two (well they don't appear to in the Kindermann and Snell book). —3mta3 (talk) 17:34, 1 June 2009 (UTC)[reply]

I've renamed and changed it to include both definitions, and tried to clarify the difference. —3mta3 (talk) 18:59, 2 June 2009 (UTC)[reply]

I am interested in the 4-cycle example you mentioned, where, if I understood you correctly, the 4-cycle is an independency map of the probability distribution (or else: what does it mean that the Markov property is satisfied?) but the distribution cannot be factorized using the 4 clique potentials. Skimming through the paper, the only image of a 4-cycle I found is on page 12. Is that the one you are referring to? -- 131.159.0.7 (talk) 19:30, 28 July 2009 (UTC)[reply]

Yes, that was the example I was referring to, and what I meant by the Markov property being satisfied. I haven't actually read that paper, so can't really comment on anything else in it, but the same example also appears in this paper (Examples 4 and 7) where they explain why it doesn't factorise. —3mta3 (talk) 16:37, 29 July 2009 (UTC)[reply]

Thanks 3mta3. Unfortunately the page has become obscurer in a couple of ways since your edit. The condition Z=1 is irrelevant since it can always be taken into the factor functions. I'll keep away from the statistics bit, but once again I propose the separation of the physics meaning of MRF from the use in statistics. I'm happy to arrange this, but it would be better by consensus. I _think_ the main problem is difference of approach between the two communities. Megadeff (talk) —Preceding undated comment added 12:41, 27 July 2009 (UTC).[reply]

Okay, I've cleaned up the obvious errors. Personally, I wouldn't separate the articles, as they do refer to the same idea (a probability distribution with Markov properties over a graph), unless there is a large amount of physics specific material (in which case I think it would be better to keep the main article, and have a Markov random fields in physics type sub-article). —3mta3 (talk) 16:08, 27 July 2009 (UTC)[reply]

I'll think it over. Megadeff (talk) 18:56, 27 July 2009 (UTC)[reply]

MRFs for the representation of probability distributions[edit]

First of all, it should be noted that while the (conditional) independencies represented by an MRF (as given by the Markov properties) are of utmost importance, they alone are essentially meaningless if one does not also describe how one can generally represent probability distributions that are faithful to the (in)dependencies being represented. Most probably, the causality is even reversed here, as it seems more natural to start off with a probability model (such as a product of potential functions) and then think about the independencies that it represents and how to represent them graphically. It is quite strange of you, 3mta3, to state that you "fix[ed] some errors" in the article, when in fact you basically reverted the respective section of the article to a previous state, which, most notably, contained a highly restrictive formulation of the way in which a Markov random field may be used to represent a full-joint probability distribution over a set of random variables. In the general form, the product of clique potentials for a given state of the network need not be normalized, i.e. the resulting product is only required to be proportional to the probability value of the state. Regardless of normalization, the Markov properties are satisfied, as can easily be shown. What is missing in the article is therefore the general formulation of how the full-joint distribution over a set of random variables can be compactly represented by an MRF such that it is in accordance with what the graphical structure asserts. This is highly relevant, because in practice, that is how MRFs are used - at least in artificial intelligence and machine learning, my field of research. (I would in fact argue that the state of the article in April 2009 (before you started editing it) contained more of the relevant formulas than it does now.) Your talk about (directly) factorizing the full-joint is only a special case, which I explicitly addressed in my previous version of this section of the article. So I am curious: What exactly are the errors that you claim to have fixed? -- 131.159.0.7 (talk) 16:50, 28 July 2009 (UTC)[reply]

As Megadeff points out above, the normalization constant can simply be included in one of the φ's, so the definition with the Z is no more general than the definition without.—3mta3 (talk) 17:43, 28 July 2009 (UTC)[reply]

Requiring that the normalization constant is included in one of the factors amounts to an additional assumption, thus a special case. And I do hope you agree that special cases are less general. If no assumptions about clique potentionals are made, the partition function is required (which is why it is, by the way, always present in the literature on MRFs as far as I have come across it - including the paper by Richardson and Domingos that is linked to above). Furthermore, you fail to mention your assumption. Therefore, whether your changes actually "fixed errors" is all the more questionable. You did not answer my question: What are the errors that you claim to have fixed? -- 131.159.0.7 (talk) 18:08, 28 July 2009 (UTC)[reply]

Note that, in particular, inference problems are simplified by assuming Z=1, since an explicit calculation of Z, which would otherwise be necessary and is computationally expensive, can be omitted. (Please also note my question above.) -- 131.159.0.7 (talk) 18:21, 28 July 2009 (UTC)[reply]

Ah, I think I see what you are getting at. My point is that, purely from a theoretical perspective, if we can write the joint density of a single MRF as:

p(a,b,c)={\frac {1}{Z}}\phi _{1}(a,b)\phi _{2}(b,c)

where Z is a constant,

then we can also write:

p(a,b,c)=\phi _{1}^{*}(a,b)\phi _{2}(b,c)

where

\phi _{1}^{*}(a,b)={\frac {1}{Z}}\phi _{1}(a,b)

.

Thus the claim that restricting Z=1 would result in a stronger definition is clearly false (this was the error I claimed to have fixed). Whether or not we know the numeric value of Z is irrelevant, as long as it is finite the definitions are equivalent.

However in most practical circumstances we don't work with a single distribution but instead with a parametrised family of MRFs, in which case Z is no longer a constant, but a partition function over the parameter space (such as in the log-linear model example in the article). Thus knowledge of Z is important as it will influence the likelihood function.

Does that answer your question? —3mta3 (talk) 08:23, 29 July 2009 (UTC)[reply]

While it is true that allowing an arbitrary normalization constant does not in any way change the class of probability distributions that can be modelled, imposing Z=1 does put unnecessary restrictions on the clique potentials and therefore leads to a less general definition of the way in which a distribution can be factorized using MRF semantics. I would suggest that the most general definition should be used in this article - as it is also found all over the literature. I admit that when I rewrote the section, I did not properly read the part in which you describe the limits of factorization in general and thought, because you assumed Z=1 above, you were merely naming (restrictive) special cases where Z=1 (such as the case of the chordal graph in which factors as they would appear in a Bayesian network are used, i.e. marginal and conditional probabilities). What I do not understand is why you did not simply change the sentence in question but instead reverted the entire section, also deleting, for example, the extra information on the range of the potential functions I added.

From a computational perspective, the presence or absence of knowledge about Z is hardly irrelevant. Parametrised families of MRFs are not really relevant to the scope of this article, but if anything, their existence favours the definition that includes Z.

I think one of the key points here is that to you, an MRF is just a representation of (conditional) independencies of a joint probability distribution, whereas to me, this qualitative model is inextricably linked to a quantitative model that describes the distribution itself. Therefore, you are only concerned about describing a sufficient condition for a distribution to have a corresponding MRF (namely that it can be factorized in the way you mention), whereas I am interested in naming the most general way in which MRFs can be used to represent probability distributions.

Regarding the example on the limits of factorization that you mentioned, I would be interested in learning more about this. Please see my question in the section above. -- 85.181.84.137 (talk) 13:56, 29 July 2009 (UTC)[reply]

I still don't understand your claim how restricting Z=1 puts unnecessary restrictions on the clique potentials. Can you give an example? As for deleting the extra information on the range of the factor potentials, that was unintentional (though it is somewhat obvious, since a density must be non-negative). Also, see my reply above. —3mta3 (talk) 16:53, 29 July 2009 (UTC)[reply]

Finite graphs[edit]

Shouldn't one say somewhere that the underlying graph has to be finite? When this is not true, many of the stated results do not hold in general (e.g., the equivalence between local and global Markov properties), because of the possibility of phase transitions. Such a remark would be particularly useful, since the first example given is the Ising model, which is very often defined on infinite graphs.--YVelenik 15:36, 7 April 2010 (UTC) —Preceding unsigned comment added by Velenik (talk • contribs)

Independency map[edit]

I took out the following two paragraphs from the definition because they are very badly worded and confusing and not really needed in a definition of a Markov random field. They were added by user 131.159.0.7.

In other words, a set of random variables X is a Markov random field with respect to a graph G if the joint probability distribution P(X=x) over if and only if graph separation in G implies conditional independence: If two nodes corresponding to

X_{i}\in X

and

X_{j}\in X

are separated in G after removing from G a set of nodes Z, then P(X=x) must assert that

X_{i}

and

X_{j}

are conditionally independent given the random variables corresponding to Z. If this condition is satisfied, we say that G is an independency map (or I-Map) of the probability distribution.

Many definitions go even further and require that G be a minimal I-Map, i.e. an I-Map from which none of the edges could be removed without destroying its I-Mapness. (It is reasonable to require this as it leads to the most compact representation, which involves as few dependencies as possible; note that a complete graph is trivially an I-Map.) In the case where G is not only an I-Map (i.e. it represents no independencies that are not indicated by P(X=x)) but also represents no dependencies that are not indicated by P(X=x), G is called a perfect map of P(X=x), as it represents precisely the set of independencies indicated by P(X=x).

However, perhaps the fact that the graph G with respect to which a random field is a Markov random field is sometimes referred to as an independency map or I-Map should be mentioned in the article? --Gustav (talk) 22:25, 1 January 2012 (UTC)[reply]

Not sure about the wording, but at least in this book (http://books.google.it/books/about/Probabilistic_reasoning_in_intelligent_s.html?id=AvNID7LyMusC&redir_esc=y) on p96 a Markov Network is defined as a minimal I-Map of a dependency model. Because I haven't read another book on Markov Networks I can't judge if that is a unique thing of this author, but in his book the I-Map is the basic concept for MNs. — Preceding unsigned comment added by 87.253.115.77 (talk) 19:23, 7 May 2013 (UTC)[reply]

backwards notation[edit]

This article uses phi and f exactly backwards from how its commonly used; this is most irritating. I once changed this around, but someone has changed it back. I believe that phi = log f is the common notation, right? phi's are always potentials, and never probabilities. That is,

P(x)=\prod f=\exp \sum \phi

Can we please put it back?

...Hmm. Its even worse that that. The current article uses f to stand for the delta function, and w for the potential. This seems like a doubly-non-standard notation, and goofy terminology to boot. Yeah? Who talks like this? linas (talk) 14:17, 9 April 2012 (UTC)[reply]

Computer scientists and econometricians talk like this. They have a rich tradition of making well-known maths look amazing and new by obfuscating notation. Sadly, when they do come up with something genuinely new and interesting, they have all of us scratching our heads... 2A01:CB0C:CD:D800:9047:E0B9:299:591D (talk) 08:28, 9 August 2022 (UTC)[reply]

When the probability distribution is positive?[edit]

The introduction to the current article has the sentence: "When the probability distribution is positive, it is also referred to as a Gibbs random field, because, according to the Hammersley–Clifford theorem, it can then be represented by a Gibbs measure." There is nothing preceeding this statement that explains what "the probability density" is. There is no explanation of what a "positive" probability density is. My guess is that a "positive" density is a density function that is strictly greater than zero on its domain.

Tashiro (talk) 15:41, 11 October 2012 (UTC)[reply]

This bothers me too. In the finite case (finite graph and finite target space for the pointwise values of the fields (states)), I suspect it means that each state has positive probability. I agree that this should be stated explicitly. But it's particularly important in the case when the graph is infinite, because then the state space is uncountable. Maybe then it means that each set of states defined by finitely many conditions (at finitely many points of the domain graph) should have positive measure. These are the open sets in the natural compact-open topology on the state space, if the target space is finite. 178.38.78.134 (talk) 02:51, 22 January 2015 (UTC)[reply]

Markov random field definition[edit]

It would be nice to refer to "random field" and explain each word in the term. Compare for example with https://en.wikipedia.org/wiki/Gaussian_free_field. A random field does for example not need to be discrete. Here this is assumed to be the case without much thought. If the Markovian property is only defined for discrete fields, that must be stated as the reason for jumping to such a very specific model. I bet if you give a mathematician the mathematical objects of "random field" and "Markov property", he will come up with totally different things. :-) Anne van Rossum (talk) 13:49, 16 August 2014 (UTC)[reply]

Assign sigma-algebras ${\mathcal {F}}_{U}$ to open subsets $U\subset T$ of a topological space $T$ in such a way that for any $U_{\alpha },\alpha \in I$ we have ${\mathcal {F}}_{\bigcup _{\alpha }U_{\alpha }}=\bigvee _{\alpha }{\mathcal {F}}_{U_{\alpha }}$ . Call $U\mapsto {\mathcal {F}}_{U}$ Markovian if for every pair $U_{1},U_{2}$ , such that $U_{1}\cup U_{2}=T$ , we have ${\mathcal {F}}_{U_{1}}\perp \!\!\!\perp {\mathcal {F}}_{U_{2}}\mid {\mathcal {F}}_{U_{1}\cap U_{2}}$ . In case we have a (possibly distributional) random field with parameter space $T$ , the ${\mathcal {F}}_{U}$ 's are supposed to be generated by the restrictions of the field to the open subsets. Has anyone seen something equivalent to this definition in the literature? --AlexShamov (talk) 16:58, 22 August 2014 (UTC)[reply]

Finite degree?[edit]

In the definition of Local Markov property -- should the neighborhoods be finite? Should each vertex of the graph have finite degree? 178.38.78.134 (talk) 02:41, 22 January 2015 (UTC)[reply]

Local Markov properties in Definition[edit]

The definition says "Given an undirected graph G = (V, E), a set of random variables ... form a Markov random field with respect to G if they satisfy the local Markov properties:" and then lists three properties: the Pairwise Markov property, the Local Markov property and the Global Markov property. Are these three the "local Markov properties" that the first sentence of the Definition mentions? Then it is a little confusing that the "Global Markov property" is one of the "local Markov properties". Or does the phrase "local Markov properties" only refer to the second one of the three? 80.235.82.99 (talk) 14:17, 6 March 2015 (UTC)[reply]

I agree, this is atrociously badly written. It seems to be symptomatic of the field. I am waiting for some actual mathematicians to take an interest and write a decent textbook about it. 2A01:CB0C:CD:D800:9047:E0B9:299:591D (talk) 08:26, 9 August 2022 (UTC)[reply]

Physics?[edit]

The article lead states that "in the domain of physics and probability, a Markov random field is a set of random variables having a Markov property described by an undirected graph". So how are Markov random fields relates to physics? —Kri (talk) 22:27, 24 June 2015 (UTC)[reply]

"is particularly clear in the following formulation..."[edit]

Still not very clear, though. Does this field attract people that are unusually unskilled in mathematical writing? (Computer scientists, econometricians...). 2A01:CB0C:CD:D800:9047:E0B9:299:591D (talk) 08:25, 9 August 2022 (UTC)[reply]

Unclear passage[edit]

The section Clique factorization begins as follows:

"As the Markov property of an arbitrary probability distribution can be difficult to establish ...."

But he previous section, Definition, defines several kinds of "Markov property" only for random variables and not for distributions.

Or am I missing something?

In case I am not missing something, I hope someone knowledgeable about this topic can make this passage clearer.

Also: The article uses the word "variable" when I am almost sure that it means "random variable".

If that is correct: That might be fine for mathematicians in this field talking among themselves. It is not fine for an encyclopedia. Spend the extra minuscule bit of energy and make Wikipedia clear to the reader. 2601:200:C000:1A0:1922:FB59:1D5E:AB37 (talk) 03:38, 26 August 2022 (UTC)[reply]