Talk:Simulated annealing

Mathematics High‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
High	This article has been rated as High-priority on the project's priority scale.

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

On the intro section[edit]

This was the original reading of the introductory section:

Simulated annealing is a generic probabilistic heuristic approach to global optimization problems. It locates a good approximation to the global optimum in a large search space with reasonable probability. It was independently invented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. Cerny in 1985.

The name comes from annealing in metallurgy, a technique involving heating and controlled cooling of a material to improve its structural properties by removing crystal defects. In this process the metal reaches its most stable configuration, minimizing its total internal energy.

Three remarks:

Shouldn't this be called a META-heuristic? As I understand, a heuristic is a non-guaranteed procedure that applies to a specific problem, e.g. "pick the nearest unvisited node" would be an heuristic for Euclidean TSP; whereas a meta-heuristic would be a general appraoch that can be used for almost any problem -- such as simulated annealing, genetic algorithm, etc.
I've never heard of a meta-heuristic. I would not call it a heuristic, but a heuristic method (as opposed to a deterministic method, like SQP or steepest descent). moink 04:37, 2 Aug 2004 (UTC)
Perhaps I hark from a different fold, but I have always seen SA (and GA, greedy, tabu, etc.) being labeled as "meta-heuristics".
Basically, as said above, a heuristic is any technique which you cannot prove is good, but you intuitively expect to be better than random search — usually, for a specific class of problems, or for a specific distribution of instances of a problem. For Euclidean Traveling Salesman, a heuristic could be "compute a Delaunay triangulation of the sites and look for an Hamiltonian path on it."
A meta-heuristic is a heuristic that (1) is supposed to work on an very general class of problems, such as "optimizing a black-box function" or "build a boolean string classifier from an arbitrary set of examples"; and (2) includes among its steps calls to user-given black-box procedures to do the problem-specific steps — enumerate the "neighboring" states, compute transition probabilities, discard or merge partial solutions, etc. Usually these black boxes are themselves heuristics, specific to the user's problem; hence the name "meta-heuristic".
Jorge Stolfi 01:30, 5 January 2006 (UTC)[reply]
The claim that simulated annealing finds a GOOD approximation to the global optimum with REASONABLE probability is and rather misleading (and, strictly speaking, rather meaningless). Simulated annealing may satisfy some users, but it is easy to make up instances where it will deliver, say, a solution 2³⁰ times worse than the global optimum, with probability 1-2^-30, even with a cooling period of 2³⁰ steps. There is no reason to assume that the reader's optimization problems will be of the first type, and not of the second type.
Yes, this should be more clearly explained. We should give examples of the types of problems in which it is likely to work and those in which it is likely not to work. moink 04:37, 2 Aug 2004 (UTC)
Metallugical annealing does NOT "minimize the total internal" energy, by a long shot. Ignoring the nuclear energy and quantum noise, it will usually lower the total bond energy by only a little bit, and stop still very far from the optimum. Even if cooling is slow enough to turn the whole piece into a single perfect crystal, if the crystal axes happen to be oriented in a non-optimum way, the chances of them flipping to the optimum one are nil.

Jorge Stolfi 20:34, 30 Mar 2004 (UTC)

Feel free to fix it. moink 04:37, 2 Aug 2004 (UTC)

This statement was deleted:

If T is infinity, it moves around randomly.

It is not correct for the "classical" transition probability function, because that version would always take a downhill move, even when T=&infinity;. (The phrase is valid for a physical system, though; which only underscores the bogosity of the "physical analogy" claims.)
Jorge Stolfi 03:49, 11 Jun 2004 (UTC)

Yeah, it's more correct to say that the method was inspired by metallurgy than that it is a direct analog. moink 04:37, 2 Aug 2004 (UTC)

This too was deleted:

Nonetheless, simulated annealing can be very effective at finding good suboptimal solutions.

This is definitely POV (or, rather, a meaningless statement — it all depends on what one considers "very effective" and "good solution").
In fact, it seems that the only true claim that can be made about SA is that, unlike the greedy method, it will not get stuck in the first local minimum. One cannot even claim that it will outperform the trivial method ("generate N random states, pick the best one").
Jorge Stolfi 04:46, 11 Jun 2004 (UTC)

No, but one can say that in a certain class of problems, it will with high probability outperform a random search. moink 04:37, 2 Aug 2004 (UTC)

I am done for a while. There is still a major problem in the article, which is the complicated interation between the neighbor selection fucntion and the energy-determined transition probabilities. They should perhaps be merged into a single "move" function.
Jorge Stolfi 06:48, 11 Jun 2004 (UTC)

Great! Feel free to change it. moink 04:37, 2 Aug 2004 (UTC)

I don't understand the above comment. The connection between neighborhoods and energy functions is explicitly made in the Hammersly-Clifford theorem, and its converse. One view is a Gibbs random field, one a Markov random field, but they are essentially equivalent. -ska

See the section "Greedy optimization" below. --Jorge Stolfi 01:30, 5 January 2006 (UTC)[reply]

Sheesh.

This is the biggest pile of nonsense I've read in years. Annealing in metals has absolutely nothing to do with "thermodynamic free energy" and this in particular -- "the rate of cooling dictates the magnitude of decrease in the thermodynamic free energy, with slower cooling producing a bigger decrease" is a total crock.

Hardening and annealing in general refer to ferrous metals. At room temperature they have one crystalline organization. Above a transformation temperature they have another. The crystal structure above the transformation temperature is generally stronger. In order to achieve a harder, stronger form, one heats the material above the transformation temperature, then cools it quickly enough that the crystal structure does not have time to change (termed "quenching"). It is now locked into the high-temperature form even at room temperature.

Annealing is the opposite. If one has a hard, strong material but wishes to draw it back to a ductile state, you heat it above the transformmation temperature then let it cool slowly enough to change crystal structure to the softer, weaker form.

Relevant terms would be martensite, bainite, and transformation temperature. This is a gross simplification, but 1,000 times more accurate than what was written here.

Annealing has nothing nothing nothing to do with any imaginary "thermodynamic free energy." Whoever wrote this nonsense should have to take several classes in materials science and pass with at least a C before being allowed to edit wikipedia again. 210.22.142.82 (talk) 12:03, 10 February 2016 (UTC)[reply]

Interesting. I'm somewhat familiar with simulated annealing literature (in optimization context), but not studied any materials science (so I can't judge the validity of the above claim). The stereotypical treatment on SA often just gives a very vague description of the metallurgy side of things and why one would do annealing, usually it then moves to an intro on statistical mechanics concepts (Boltzmann distribution, and yes, thermodynamic free energy, entropy, and so on). The Ur-Example would be the original article by Kirkpatrick et al. Such texts don't make claims exactly like in the paragraph that was removed (usually they would say that if the material has a regular crystalline structure it *is* in a low energy state), but that's probably where that particular passage came from, or why that particular passage had been there without anybody noticing since 2012 Ebling Mis (talk) 07:10, 6 April 2016 (UTC)[reply]

The paragraph on metallurgy goes into excessive detail on a topic that is not relevant to this page. I think it should be reduced to the first sentence: the reader only needs to have a vague idea of why it might be named after annealing. They don't need the details of how annealing works on the atomic level. Thoughts? LudwikSzymonJaniuk (talk) 09:32, 7 January 2018 (UTC)[reply]

Old temp page still in use?[edit]

There's a page sitting in Simulated annealing/Temp that hasn't seen any significant activity since it was created in October 2004, and isn't linked to anywhere. Does anyone know what's up with it? Bryan 07:10, 29 May 2005 (UTC)[reply]

Credit for Metropolis et al?[edit]

Hello. Wasn't simulated annealing introduced first in the article:
Metropolis,N., A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, "Equation of State Calculations by Fast Computing Machines", J. Chem.Phys.,21, 6, 1087-1092, 1953.

?

As I understand it, no. The Metropolis algorithm is a way of sampling from a distribution

P(x)={1 \over Z}\exp(-\beta E(x)).

for some value of inverse temperature

\beta

and for some energy function

E

that is a function of a vector

x

. It was typically used to computed expected values of functions over x. I believe that

\beta

was held fixed. Kirkpatrick realized (and proved) that if

\beta

decreased slowly as time approaches infinity, then the expected value of x approaches the global minimum of E. -- hike395 02:01, 4 Jun 2005 (UTC)

Hike has it basically correct. Metropolis introduced the first MCMC sampling method from a Gibbs distribution (a very important development!). This is sampling for a fixed temperature. Simulated annealing, though was introduced in the early to mid 1980's by Kirkpatrick et. al., and I think Cerny independently. Essentially these methods turn the homgeneous Markov chains of the MCMC samplers into non-homogenous chains. The MCMC sampler connection should be made explicitly in these pages.

Maybe you like the following paragraph I found in "Optimizing simulated annealing schedules with genetic programming" from Bölte and Thonemann, gives proper credit to both Metropolis and Kirkpatrick, in my opinion.

"Simulated Annealing originated in statistical mechanics. It is based on a Monte Carlo model that was used by Metropolis et al. to simulate a collection of atoms in equilibrium at a given temperature. Kirckpatrick et al. were the first who applied Simulated Annealing to solve combinatorial optimization problems."

Hello! I've been doing some investigating and this 'paper': Gubernatis JE. (2005) Marshall Rosenbluth and the metropolis algorithm IN: Savannah, GA, 46th Annual Meeting of the Division of Plasma Physics of the American-Physical-SocietyNOV 15-19, 2004 Amer. Inst. Physics

States how Rosenbluth derived the entire algorithm, and Metropolis only gave the group (Teller&Teller, Rosenbluth & Rosenbluth) time on the machine. I've edited the introduction accordingly. Pureferret (talk) 20:09, 7 April 2011 (UTC)[reply]

let's cool down wp a bit[edit]

... using simulated annealing. i have recently suggested that the cooling process could be initiated by slowly raising the minimum amount of reference information for matured articles with software aids like combo-boxes for reference type, etc. on top or instead of the underused comment line. (http://thread.gmane.org/gmane.science.linguistics.wikipedia.misc/25805) that should cool down at least vandals or scatterbrains... ;). ok let's hear more opinions. -- Kku 09:30, 7 December 2005 (UTC)[reply]

Greedy "optimization"[edit]

This section says:

In some implementations of the method (including the original formulation by Kirkpatrick et al.), the condition to perform the move is "en < e or random() < P(...)". That is, downhill moves are always taken, irrespective of the value of P. However, this "greedy optimization" is not only unnecessary, but it is in fact quite opposite to the fundamental principle of SA.

Is this "quite opposite to the fundamental principle of SA" according to whom? This is an unsubstantiated POV, unless there is a citable reference. I'd say that the "fundamental principle" of SA is that some uphill moves have to be accepted. There's no principle that says that not every downhill move has to be accepted. Itub 21:12, 4 January 2006 (UTC)[reply]

Unfortunately it is hard to tell exactly what is the "fundamental principle of SA", because there is quite some distance between the stated inspiration (the annealing of a physical system) and the concrete implementations that are usually given. The main discrepancies are the "greedy optimization" and in the serial testing of the neighbors. To see the latter problem, suppose you have neighboring states s1,s2,... and your probability function assigns to them probabilities p1,p2,.... Typical implementations of SA (without greedy optimization) do something like this

     if rand() < p1 then select s1; else 
     if rand() < p2 then select s2; else 
     if rand() < p3 then select s3; else
     ...

Obviously this algorithm will NOT select state si with probability pi. So the physical analogy and the exp(-dT/T) formula are quite misleading.

Anyway, we can guess that the "fundamental principle of SA" is the idea that moving downhill is not always the best choice, especially when T is high. And this principle is followed to some extent by standard implementations, even those with the "greedy optimization". In the above example, suppose that s3 is downhill but s1 and s2 are uphill; the standard algorithm will go uphill with probability p1 + (1-p1)p2. If we were to swap s1 with s3 in the list of neigbors, standard SA would always move to s3!

Said another way, the "greedy optimization" is equivalent to setting momentarily T=0 when the neighbor si has lower energy. Note, this "instant freezing" does not occur when s has such a neighbor, but only when (if) the algorithm gets to that neighbor while scanning through the list. Perhaps putting it this way makes it more obvious that the "greedy optimization" is against the "gradual cooling" spirit of SA --- a programmer's lapse from "SA thinking mode" into "greedy thinking mode"?

Said yet another way, if the "greedy optimization" is good, then it should be even better to just run the greedy meta-heuristic after each transition, and switch to SA only if the current state turns out to be a local minimum. Would you consider this to be an improvement on SA?

All the best, --Jorge Stolfi 00:56, 5 January 2006 (UTC)[reply]

I think the confusion is this "serial testing of the neighbors". In the systems I work with, the neighbor list is infinite and not enumerable, so the way it works is: 1) you generate a neighbor randomly; 2) you decide whether to accept it or reject it. In this context, accepting all the neighbors that go downhill is not "bad", its just a particular case of P where P = 1 where En < E (it should be pointed out that when generating this random move, the energy goes up more often than not). I can see how this could cause trouble if you do a serial testing of the neighbors, but this point should be specified in the article. Itub 03:40, 5 January 2006 (UTC)[reply]

An easier argument may be: if you set P=1 when dE<0, then the probabilities pi = P(E(si)-E(s)) will not add to 1, and hence they cannot be the actual transition probabilities.
As for your comment: if the neighbor list is infinite, then the enumerator must pick the first/next neighbor with non-uniform probability. In that case, it is easy to see that the actual probability of moving to si will not be the specified probability pi. I believe that they are different even if we assume that the neighbor list is finite, that the next neighbor is picked at random (with repetitions and uniform probabilities), and that the specified pi add to 1. But I cannot check that now...
Note that the algorithm "works" in some fashion even with serial testing and the "greedy optimization". With serial testing , the blabla about Metropolis-Hastings and exp(-dT) are pure nonsense; adding the GO just turns them into double-strength nonsense.
All the best, --Jorge Stolfi 08:16, 5 January 2006 (UTC)[reply]

Why would pi have to add up to 1? pi is not the probability of transition to state i, but the probability of the move being accepted. The transition probability is the product of pi and the probability of attempting the move. For example, let's say you have 10 neighbors and pick one randomly with uniform probability (that is, the probability of attempting to move to any given neighbor is 0.1). Let's say all the neighbors have pi = 0.5. Each one will contribute with 0.1*0.5 = 0.05 to the sum of the transition probabilities. This adds up to 0.5. The remaining 0.5 is the probability of staying in the original state, because the move was rejected. Therefore, all the probablities add up to 1. Now, let's say that all the neighbors are downhill and have pi = 1. Each one will contribute with 1*0.1=0.1 to the sum of the transition probabilities, which again will add up to 1 (the probability of staying in the original state is now zero). Itub 17:02, 5 January 2006 (UTC)[reply]

I should also point out that in the original version that uses P=exp(-dE/kT), the test is still always random() < P(...). It just happens that, for that function, P > 1 when dE < 0 (in actual implementations, yes, you might use an or to save computing the exponential function, but it's not strictly necessary). Itub 17:10, 5 January 2006 (UTC)[reply]

Indeed, that is the problem: the alleged physical justification assumes implicitly that the pi = exp(-dE/T) are the *transition* probabilities; but when one uses the pi as *acceptance* probabilities in serial testing, the transition probabilities are something completely different. Thus the physical "Metropolis-Hastings/exp(-dE/T)" justification simply does not hold; and neither the physical nor the intuitive justification hold when one uses GO.
I am adding a couple of paragraphs to the article which may help settling this argument. Please check.
All the best, --Jorge Stolfi 20:45, 5 January 2006 (UTC)[reply]

I'm still unconvinced (although the section about saving the best solution is helpful :). The main problem I see is that the use of the term "greedy" in this section is an exaggeration. A "greedy" optimization is one that always goes downhill. A simulated annealing implementation that just happens to accept every downhill move that it sees, due to its choice of P(), is not greedy as long as it also tries uphill moves. My suggestion is to delete the entire greedy "optimization" subsection, actually.

I must insist that pi = exp(-dE/T) has never been a transition probability, even in the original physical justification (Metropolis Monte Carlo). All the equations I've seen that deal with transition probabilities in MC methods include two factors: one is the probability of attempting the move (which can be largely arbitrary as long as certain conditions are met), and the other is the probability of accepting it (which is the Metropolis test). Itub 22:54, 5 January 2006 (UTC)[reply]

Metaphor too complicated?[edit]

It seems to me that the general description of the algorithm could be simplified greatly if the metallurgy metaphor was dumped for a paragraph.

I'd add something like:

Simulated Annealing is a search process that has a better chance at finding global maxima than simple hill climbing. This is because at the beginning of a space search it's random steps are very large and become smaller as the search continues until finally the algorithm decides on a local maxima from the best place it has found in the space.

It's a little rough because I'm terrible at describing algorithms, but it frames the concept in terms of another space search technique that the reader would either A) be familiar with or B) probablly able to understand a little bit more easily.

Would a paragraph like this be welcome?

I would favour such a change. Most people learning about simulated annealing are more likely to have already learned about hillclimbing than to have learned about metallurgical physics, although it's a vivid metaphor. I think it also gets across the point that simulated annealing is a search algorithm, and not some kind of physical simulation. Deco 23:15, 15 May 2006 (UTC)[reply]

I do not favor such a change. The pedagogy of this article is consistent and very good at worst. since there is nothing factually wrong with this section, I vote that it stays in. 64.134.235.11 (talk) 20:45, 7 November 2012 (UTC)[reply]

Initial temperature[edit]

Currently the article says:

The initial temperature must be large enough to make the uphill and downhill transition probabilities about the same. To do that, one must have some estimate of the difference e' − e for a random state and its neighbours.

So far this is the best advice I've found on the subject of choosing an initial temperature, and it still doesn't spell out how exactly I do it - the author seems to know but isn't giving details. I'd like to see some added content at least giving an example of how to choose an effective initial temperature. Deco 23:24, 15 May 2006 (UTC)[reply]

For what it's worth, I just found out about the paper "Computing the Initial Temperature of Simulated Annealing" which probably answers my question. :-) Maybe a good reference to add? Deco 23:28, 15 May 2006 (UTC)[reply]

If you add it, I think you should link it as http://dx.doi.org/10.1023/B:COAP.0000044187.23143.bd, which is a more permanent URL. Itub 21:24, 16 May 2006 (UTC)[reply]

Quenching[edit]

This section reads highly unencyclopedic:

If the modified Simulated Quenching (SQ) works, fine, but it should not be called SA. SA is often used for hard problems where the chances are reduced for being trapped in local minima. SA also is used in problems with complex constraints since the method of rejection is simply applied to keep the sampling within the constrained region(s). In practice, when the need is clear to use a sampling technique, it is best to do at least a few long runs with SA, and then use SQ modifications to speed up calculations as long as the same results are achieved. To apply any generic sampling technique like SA over many classes of nonlinear problems, it should be expected that tuning of some basic search parameters will be required.

First, the first sentence is (oddly worded) POV. Second, the section doesn't explain what SQ is in comparison to SA. Third, the second half reads like a how-to. I don't know SQ so I can't modify this, but maybe someone here can. Thanks! ~ trialsanderrors 18:56, 1 September 2006 (UTC)[reply]

Monte Carlo[edit]

This is a good example of a Monte Carlo method, so shouldn't there be a reference to Monte_carlo_algorithm?

Shouldn't there also be a reference to the Ising model? (Might be a biased point of view, since I did my thesis on the Ising model).

I was struck how simulated annealing resembles the way the Ising_model is usually simulated; this can be considered simulating annealing where one usually takes the Metropolis acceptance probability as given above. I'm inclined to think that the people who invented simulated annealing were acquainted with the simulation of the Ising Model and invented it as a generalization of the method.

(The fact that SA resembles the Ising model is not a coincidence, since an adapted version of the Ising model (with conserved spin, where spin up represents the presence of a particle and spin down represents the absence of a particle) can actually be used to accurately model the cooling down of a dense gas/liquid.)

Energy conservation :)[edit]

How do you tell that "Therefore the goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy"? --Javalenok 22:43, 11 November 2006 (UTC)[reply]

Here Thermodynamic free energy is meant, so it is ok with conservation. 203.244.221.2 07:39, 23 January 2007 (UTC)[reply]

Dates need to be corrected[edit]

The date that Kirkpatrick et al. came up with simulated annealing is wrong. I don't have the information handy as to when they first published/presented the idea, but I know it was before 1983, because during the summer of 1982 I worked with Scott Kirkpatrick on some applications of simulated annealing to circuit design, and they had already been working with the idea for a little while at that point. 141.149.211.196 20:42, 6 April 2007 (UTC)[reply]

It may be hard to document the date they "came up" with simulated annealing. What we have is the date of the oft-cited Science paper about it. Maybe just rephrasing the statement so that it reads "first published by Kirkpatrick in 1983..." or something to that effect, rather than saying "invented" (of course, if you know of an earlier publication feel free to add it!). --Itub 07:56, 10 April 2007 (UTC)[reply]

Plain English section[edit]

The "In Plain English" section is inadequate for several reasons.
First, it does not try to describe SA; those comments could fit almost any meta-heuristic. SA is a specific meta-heuristic, that moves from state to state according to a fairly specific method, that depends on a "temperature" variable. An "explanation" that does not describe the method nor explains the role of temperature is not an explanation of SA.
It is like saying "a horse is an animal with four legs". This sentence is surely readable, but it does not actually explains what a horse is; and will actually mislead the reader into thinking that any four-legged beast is a horse.
The text also implies that SA behaves like the greedy method, until it somehow detects that it has fallen into a local minimum; and only then takes some uphill moves. Obviously this is not correct; SA will take uphill moves at any time, and cannot tell whether the current state is a local minimum.
Finall, the text is badly written:

If P is a problem for which no better solution is known than an extensive brute-force search, then obviously SA cannot bebetter than brute-force!
Finding a set of 100 numbers that fit some characteristic is not a meaningful example: it may or may not require brute force and/or SA.
that is the best solution for its current region: what is "its current region"?
In an attempt to find a better solution, this technique may use various methods to jump out of the current pit: actually SA uses only one method, and uses it all the time.
such as searching for better solutions randomly by a factor of the inverse amount of the previous adjustment: this sentence is hard to understand, but seems to be saying that the magnitude of an uphill jump depends on how big was the previous downhill jump. This is false.

In conclusion, the "Plain English" section does not help the reader, and may even mislead him/her. All the best, Jorge Stolfi 04:17, 18 October 2007 (UTC)[reply]

Thanks for highlighting the issues with that section! All the best, -- itistoday (Talk) 05:36, 2 November 2007 (UTC)[reply]

On Links[edit]

Please dont include links to programs that claim to use SA unless there is some discussion on the target website of how the program specifically used the method, along with source code. —Preceding unsigned comment added by 209.56.2.2 (talk) 19:35, 17 July 2008 (UTC)[reply]

Physical analogy,again...[edit]

Hi, I am still having difficulty in understanding whether there is any real connection between SA and thermodynamics. (Perhaps I did not undertand the Metropolis-Hastings algorithm...)

SA has a set of "states", a variable called "temperature", a function E(s) of the state called its "energy". When it is in state s, the probability of it moving to some other state t is a function H(s,t,T). Note that H is not determined by the P() function alone, but also by the neighbor() generator, in a rather complicated way.

The Metropolis-Hastings (MH) algorithm aims to sample some "target" probability distribution D(s), which is known up to a constant factor. It uses a "proposal" distribution Q(s,t), that is a parameter of the method. When in a state s, it moves to a state t with a probability that depends only on D(s) and Q(s,t).

So here is the first question: is the behavior of SA in any way related to the behavior of the MH algorithm, for suitable choices of D and Q?

Second question: is there any resemblance between the distribution of states of SA, after it has been running for some time at constant temperature, and the state distribution of any physical system with the state energies defined by E()?

If the answer to both questions is "no", then why should we mention the MH algorithm and physics in this article?

All the best, --Jorge Stolfi (talk) 03:08, 25 October 2008 (UTC)[reply]

SA is an instance of MH under either of the two following scenarios:

1) Each state has the same number of neighbors. We choose randomly among neighbors with uniform probability, and greedily accept the neighbor if it has higher probability. This is equivalent to using an MH proposal distribution Q which is uniform over neighbors. Because each state has the same number of neighbors, Q is symmetric, and Q cancels in the MH acceptance formula. If D(x) is higher for the new state, the MH acceptance formula always accepts.

2) We consider neighbors serially, and choose to accept the neighbor or stay in the current state with probability proportional to D(x), without greedily accepting the neighbor if it has higher probability. This is equivalent to MH where the proposal distribution Q is proportional to D(x) restricted to the two states under consideration (the current state and the neighbor). In this case, the MH acceptance formula always accepts, but the proposed state may be the same as the current state.

In the second scenario, the MH proposal distribution is different at each step of the algorithm. However, detailed balance is maintained at every step of the chain. In both scenarios, as long as the chain is ergodic, it will converge to D(x) in the long term.

Sequential testing with greedy moves to neighbors of higher probability is not an instance MH. 173.225.52.218 (talk) 13:25, 29 March 2013 (UTC)[reply]

The physical analogy is important if nothing else as a source of inspiration for the algorithm and a mnemonic for understanding the algorithm - if it's not really accurate, that can be explained. The name is evidence enough that the original authors designed it by analogy to a physical system, or at least to their simplified model of a physical system. It doesn't have to be an accurate physical simulation. Dcoetzee 07:32, 25 October 2008 (UTC)[reply]

What? Confusing text[edit]

The text after the pseudocode listing is confusing. It seems to call into question the validity of the code without proposing a viable alternative. JKeck (talk) 15:11, 9 July 2014 (UTC)[reply]

About invention of simulated annealing[edit]

A procedure that is called now Simulated Annealing was developed by M. Pincus in 1970 who proposed to employ the Monte Carlo sampling based on Metropolis et al. algorithm (1953) to find the global minimum of a function of many variables.

A generalization of the statistical mechanics of a non-ideal gas by extending its application to non-physical multi-variant systems, which is not related to the Metropolis algorithm, was proposed by Khachaturyan et al ( 1979,1981). The authors introduced definitions of the non-physical thermodynamic functions, partition sum, free energy, temperature, and equilibrium densities as averages over the Gibbs ensemble of these systems.

That was done well before S. Kirkpatrick (1983,1985).

A number of users (including myself) tried to edit the intro section of the paper, but all changes were reverted by a certain "MrOllie". My last change from Jan 21 2020 which mentioned re M.Pinkus paper from 1970 and independent reference to it was reverted WITHOUT ANY EXPLANATION, by the same "MrOllie". When I complained about it to Wikipedia editors I received the following > > > However, since article content is not controlled by a central authority, we do not resolve editing disputes via email. In light of this, you have a few options: > > > 1.) If you think there has been a simple misunderstanding, you might begin a discussion on the article's talk page to try to engage other editors and get a sense of what is happening with this article. You can do this by clicking on the "Talk" tab located along the top of the article and then clicking on the "New Section" tab on the following page. Be sure to give your new section a title (like, "Concerns over article content", etc.) and once you write down your concerns you should sign the entry with four tildes ~~~~ before clicking on Save. Then you should check back occasionally to see if anyone has responded. This may be the simplest way to handle your issue.

I was also advised about an option to revert "MrOllie" changes, but I was warned that I might be blocked, if I do it. ( I was surprised that "MrOllie" wasn't blocked for very same reason).

Following advise of Wikipedia editors. I am putting this issue into the talk page. Any comments/advises ?

You can see changes related to Mr. Pincus and Prof. Khachaturyan in the history (Dec 2019 and Jan 2020). What's especially puzzling is that "MrOllie" reverted the changes nearly immediately, on one occasion within minutes. — Preceding unsigned comment added by SimulAnnT (talk • contribs)

If you want to give someone new credit for inventing this, you need secondary sources that explicitly say that. Without them, you are just engaging in original research, which we don't do on Wikipedia. - MrOllie (talk) 20:47, 3 June 2020 (UTC)[reply]

Let's start simply: Which source, specifically, says that "A procedure that is called now Simulated Annealing was developed by M. Pincus in 1970"? - MrOllie (talk) 15:56, 5 June 2020 (UTC)[reply]

A comprehensive review on the subject of Simulated Annealing by P.J.M. van Laarthoven, E.H.L. Aarts: " Simulated Annealing: Theory and Applications", in Mathematics and its application, Springer Science Business Media Dordrecht, 1987 has clearly and explicetly mentioned that work of Pincus (1970) preceded work by Kirkpatric on the subject. It's not an ORIGINAL research — Preceding unsigned comment added by SimulAnnT (talk • contribs)

Yes, I read that. I don't see it stating anywhere that Simulated Annealing was developed by Pincus. Please be specific, which text exactly supports that claim? Quotes or page numbers, it is a long enough book that readers should not be made to hunt for this. - MrOllie (talk) 20:00, 12 June 2020 (UTC)[reply]

Read again: footnote on the first page in Preface 1 Two earlier papers on optimization of continuous functions, [PIN70] and [KHA81], in which the analogy between statistical mechanics and optimization is already noticed, have remained widely overlooked.

And more (quote) from "Extended Pincus theorems and convergence of simulated annealing" by A. Charnes and M. Wolfe international Journal of System Science Volume 20,1989 - Issue 8

":Pincus’ 1968 formula for the (unique) global minimum of a continuous function on a compact set in E n is extended to finite multiple optima and to discrete and special variants. The impact of these on associated ergodic irreducible aperiodic Markov chain computation (Pincus 1970)—currently called ‘simulated annealing’—is exemplified and assessed leading to grave concern about what current simulated annealing processes may converge to, instead of optima.

I will add this reference to Further Reading section soon. — Preceding unsigned comment added by SimulAnnT (talk • contribs)

'Pincus noticed the anaology' is not remotely the same thing as 'Pincus developed simulated annealing.' In fact, your source says the opposite elsewhere: "An algorithm that follows the fourth approach was independently introduced by Kirkpatrick et al and Černy. It is generally known as simulated annealing", as well as the quote highlighted by User:Joel B. Lewis in his edit summary "The result is the tool called "simulated annealing", which, since its inception in 1982." I think you misunderstand what the source is saying. Also, your new Carnes citation calls van Laarthoven and Aarts a "misapprehension of history". I don't think using both makes any sense. - MrOllie (talk) 00:23, 13 June 2020 (UTC)[reply]

What is called “Simulated Annealing” now is a use of statistical mechanic Monte Carlo Metropolis algorithm for optimization of functions of multiple variables. M. Pincus did not just "noticed the analogy" as "Mr. Ollie" wrote. Pincus in his 1970 paper did develop the very method which used the Metropolis algorithm for finding global minimum of a function with multiple variables. The latter is even explicitly stated in a title of M. Pincus paper: ”A Monte Carlo Method for the Approximate Solution of Certain Type of Constrained Optimization Problems”.[xxx]

Kirkpatrick et al gave this approach a name named this approach “Simulated Annealing” and gave a credit for that in the edited version. A giving a new name is good but still not quite sufficient to give somebody a credit for method actually proposed a decade before. I doubt that scientific community share Mr. Ollie’s criterion of novelty...

Although given the foregoing, it's doesn't really matter but where exactly, in the quotation I brought from Carrnes and Wolfe, did they call Laarthoven and Aaarts review "misapprehension of history" ? These words are nowhere in a quote I brought.

Did they call the entire, quite lengthy, review "misapprehension of history" ? Or specific parts of it ? Provide, please, a FULL quote with CONTEXT, not just two words out of it.

The previous version is restored.

Actually, it is hard to understand why multiple revisions by Mr. Ollie which spans for over a year have one common feature, viz., to hide existence of earlier publications on the topic, both Pincus (1970) and Khachaturyan et. (1979,1981) This contradict to the Wikipedia appeal to add new relevant references, and, specifically, this distorts the history of development of the method.

User: SimulAnt 22:57 July 1 2020

That's not what *I* wrote, that is what *your source* wrote. We cannot go beyond what the sources say and substitute our own interpetations. The 'common feature' of my edits is to strictly follow what the sources actually say, not your overly generous interpretation of them. - MrOllie (talk) 12:06, 2 July 2020 (UTC)[reply]

From Metaheuristic Optimisations. 7. Simulated Annealing by Prof. Thomas Weise, Heifei University

Idea initially developed by Pincus 1970 , Khahaturyan (1979, 1981). Later also described by Kirkpatrick et al. [3] in 1983 , Cern´y ˇ [4], Jacobs et al. [5, 6], User: SimulAnt 06:21 2 July 2020 — Preceding unsigned comment added by SimulAnnT (talk • contribs) 13:21, 2 July 2020 (UTC)[reply]

Independently described[edit]

User:SimulAnnT, the sources you introduced noted that this technique was independently arrived at by several authors. Your version obscures this and suggests that everyone was copying Pincus. We must follow the sources on this.

Even setting that aside, you are adding a large amount of text to the lead section. Per MOS:LEAD, the lead should be a short summary of the rest of the article and not contain unique information. - MrOllie (talk) 16:36, 19 August 2020 (UTC)[reply]

"Exact algorithm"[edit]

The article says "simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound". I didn't know what an exact algorithm is, but GPT-4 and google both indicate that it's one that finds the optimal solution given infinite time. Gradient Descent doesn't find the optimal solution given infinite time (will get stuck in a local optimum), so I don't get the sentence. Can someone explain? 2A02:8071:7141:6400:647B:F5D6:9075:304C (talk) 09:04, 2 April 2024 (UTC)[reply]

Exact algorithm is not a term of art AFAIK and so your search isn't giving good results. A contrast is being made with non-statistical algorithms. Perhaps a better term would be Heuristic algorithm but I haven't checked to see if this is supported by reliable sources. ~Kvng (talk) 15:52, 9 April 2024 (UTC)[reply]