Local Search (Heuristic)

broad part of [[111.00_anchor]]

![[Pasted image 20230109152041.png]] Given example of a reel-function :: How would we find the global minima of the given function ?

An attempt would be taking the first Ableitung and then searching for all x where f(x) = 0 and then take the smallest one.

Another numeric solution:

  • start at arbitrary point
  • compute derivative, walk a small step - downhill
  • procee until we’ve found a local optimum. However we cannot guarantee that this optimum is the global optimum so the algorithm is not perfect at all.

More general principle :

Local Search :

  • Start with some initial solution - mostly randomly selected
  • explore the local neighborhood and try to find a better solution in this neighborhood.
  • If we’ve found a better solution, we take it.
  • We stop if we cannot find any better solution in the local neighborhood than the one we already have.
  • We may repeat that procedure with several initial solutions.

With that system we perpetually traverse through local solutions searching for the global solution step by step.

![[Pasted image 20230109152536.png]]

Example : Traveling Salesman ::

Local search for tsp :

Recall the traveling salesman problem. We are given a graph with weighted edges, edge weights are interpreted as distances here. We ought to find a tour through that graph that has the smallest length possible.

==Approach== :

  • Start with arbitrary tour - we select and initial tour to take through graph - is mostly randomly selected
  • Define a “local change” - we exchange the order of two points on the tour at most. We are striving for optimizing the path between these paths by exchanging the orders to travel better.
  • Then we check for each pair of points whether the tour gets shorter if we swap these points
  • We do that until no further improvmenent is possible anymore ![[Pasted image 20230109152955.png]] ![[Pasted image 20230109153003.png]] ![[Pasted image 20230109153012.png]]

This is a local search:

We create a inital state that is locally defined. Now we are improving the solution step by step to enhance the overall path locally step after step. Once we’ve found the optimal solution for all patsh in our local proble, we’ve found the optimal solution.

Depending on the overall implementation we will definitely have a termination::

  • if we have negative cycles in the graph, it will not terminate
  • if we define the premise that each solution must be strictly better than the previous, we will also terminate at some point
  • if we are not checking for strict improvement then we cannot guarantee termination.

==Improving the solution==

  • we can increase the size of the neighborhood, we swap more than just two points per step.\
    • By increasing the neighborhood we explore more possibilities, and thus are able to find even better solutions that wouldnt’ve been found in our smaller scope

==Tradeoff==

  • The larger the “neighborhood” the more complex the search gets - finding the best swap between k points, then round of swapping takes O(n^{k})
  • But the larger the neighborhood, the better the local optimum is going to be.

In order to escape traps of local optimums, we may have to adapt the size of our neighborhoods, in order to get out of those fields.

Principle : We have a large neighborhood at start and get partially smaller each time.

Inspired by physics of crystallization :

  • We start with liquid state, particles can move freely
  • We cool down the system, particles are moving into more regular positions
  • Annealing - ausglühen/aushärten >> we want the system to freeze completely fast

We apply that principle to algorithms::

  • we have a parameter called temperature T
  • at the beginning, T is larger - we are decreasing it later in time
  • THe larger T, the more freedom we have to walk through the search space - even if we worsen our current solution
  • As T decreases, we can only ove towards better solutions in the search space

Simulated Annealing - Idea :

let s be any starting solution  
repeat  
	randomly choose a solution s ′ in the neighborhood of s  
	if ∆ = cost(s ′ ) − cost(s) is negative:  
		replace s by s ′  
	else:  
		replace s by s ′ with probability e −∆/T

With high Temperature we are likely to jump to the new position because delta /T is leaning towards 1 With a high delta we are probably not jumping towards the new destination because the probability will be small.

THe probability to jump to a “bad” state depends on two things then ::

  1. the current temperature T
  2. and how bad the state is compared to the current state –> we use Delta

Now we apply and “annealing schedule” we slowly decrease T.

At the beginning, we start with a large T. then the algorithm can move freely in space and decrease it over time We then reduce the temperature and the lower T, the less the probability to escape from a local optimum is gonna be.

comments regarding Simulated annealing ::

  • Simulated annealing is somewhat of an art - finding good annealing schedules is not easy
  • But sometimes it works quite well and may be very popular too.

Discussing local Search ::

Local search always end in a local optimum - where local refers to the neighborhood relationship - the set that was explored after all. > local optimum = best solution in a given neighborhood

Typically we don’t have any guarantees to end in the global optimum, and we also don’t have any guarantees about how much better the global optimum might be, compared to our current solution. That is primarily because we cant ensure that we’ve found the global optimum during our run, because the neighborhood may’ve not covered it.

approximating success ::

The success of local search depends primarily on the choice of the neighborhood. ==main strategy to define neighborhoods==: Apply local transformations to the current solution. For example, in a graph cut problem, it would be useful to define a particular partition as a set of all partitions that agree with the given one in all but a single vertex. We set pre-requirements that must be met to define a neighborhood.

Time-Accuracy-Trade offs ::

With Small neighborhoods we provide aspects like :

  • easy to search, yet more local optima - the algorithm could stop at worst solution With large neighborhoods we have aspects :
  • more computational time to search through the neighborhood
  • yet better chances of finding a good optimum

An alternative would be :: Constructing a large neighborhood, yet not searching through it exhaustively. Instead apply some fast search procedure - like done with greedy algorithms - that are only inspecting certain parts of the neighborhood.


Takeaways ::

  • Local search is about the first thing oe can try when solving an optimization problem.
  • In many cases it leads to pretty poor solutions. We can increase the overall quality by multiple restarts - increasing quantity of searches - or adding some heuristics to it.
  • Whenever we have a more direct algorithm available, ==we should tend to that one!==
  • Local search is better than having nothing at all, so see it as ==last resort== It is especially useful for many problems where there is no efficient algorithm available - NP etc.
    • ==Be aware of its limiations==
  • In some sense, local search is also a greedy algorithm :: it only moves on to better solutions, yet will never trace back to a “previous/worse” one
  • There exist many variants of local search, and also more sphisticated local search principles such as simulated annealing. Yet there efficiency and usefulness also depends on the context and application.