Online algorithms and competitive Analysis

part of [[111.00_anchor]]


Definition

#def

“Standard” offline algorithm ::

  • When algorithm starts it has complete knowledge about the entire input
  • Then the algorithm computes its result and returns one output

In many scenarios this is an unrealistic scenario ::

  1. ==Scheduling== want to schedule a number of jobs to a number of machiens, continuously. We have jobs arriving one by one and have they have to be scheduled immediately, without knowing aobut future jobs.
  2. Paging in an operating system: decide which “page” to keep in the RAM and which ones to move to the the disk. We want to keep those pages in the RAM that are going to be used the most in future – at least we would like those to be taken – but we dont know the future requests yet!
  3. Data structures:: a tree or heap is goig to be maintained throughout the running time of an algorithm. We do not konw in advance which elements we are goign to insert or delete.

Setup of an online algorithm

  • Algorithm A gets a request sequence of inputs.
  • At the time of request it knows all the past requests for but no the future requests
  • When the algorithm serves request it incurres costs . These costs can be running time, but in many contexts they are also something else. For example, in scheduling the costs could be the number of machiens that are idle at a given point in time, or the delay by which a request is served/
  • THe cost of the entire request seqeuence is the sum of the individial requests.

And online algorithm is always able to process the previous events and data, yet never the future !

Competitiv analysis ::

We want to compare the online algorithm with an optimal offline algorithm:: The offline algorithm knows the entire sequence from the very beginning It then serves all the requests in the same order as the online algorithm, but it is allowed to make use of the knowledge of the whole sequence - in particular of future requests.

An example could be paging THe algorithm konws which are the pages that are going to be requestesd in the future. So it can easily decide which ones should be kept in the worknig memory.

Evaluating online algorithms ::

Definitino of competitiveness ::

  • Denote by the cost incurred by the best possible offline algorithm, that is the minimal cost incurred by all possible offline algorithms on the sequence
  • Denote by the costs of online algorithm A on input sequence
  • An algorithm is called c-competitive if there exists a constant a such that for all requests Observer: this is a worst case statement we do not argue about any kind of average case- which may invovle how likely a sequence is gonna be

We define the online algorithm once for all sequences and should also be able to apply it for every occurring sequence at a later time

Ski Rental problem ::

Scenario ::

We go skiing for the first time and depending on whether you like it or not you might go skiing more often in the future Renting the equipment for a day costs 50€, buying the equipment costs 500€. At each day you can decide :: rent or buy

We could potentially solve the problem with the following system ::

  • We encode the events “to ski” or “not to ski” by a sequence with or one per day. At each day we decide whether to buy them or not.
  • WE encode our rent-or-buy stategy by one number : the strategy consists of renting the equipment on days and buying skis on day t.

defines the first strategy :: buying skiis on first day

We have several possible outcomes now:

  1. We never go skiing again after the first day : Then OPT hence the competitive ratio
  2. Assume we only go skiing once more after the first day : . Then we can conclude :: and our ratio is 5
  3. We can condlude that the ratio is increasing whenever we go less skiing, and decreases whenever we go driving more and more.

defines the second strategy :: we buy on day 2 ::

It is easy to see that the worst-case- is that we stop skiing after that given day. We have spent 50 euros on the first day and 500 on the second day - 550 together then - bu the optimal solution would have been ro rent both times - 100 together. Hence the competitive ratio is 5.5.

Next, consider strategy buy on day t for . It is easy to see that the worst-case- is to stop skiiing after this day, and that the optimal strategy would be to rent t times. THis leads to the competitive ratio of : THe smallest value occurs for t=10 achieving ratio of 1.9

we ought to consider three types of sequences :

  1. sequences where you ski less than t days
  2. sequences where you ski exactly t days
  3. sequences where you ski more than t days

Finally we consider strategy : buy on day t for t >10 Here it is easy to see that the worst-case- is tht we never go skiing again but the optimal strategy would be to buy the equipment on day 1. This would lead to a competitive ratio of:

Conclusion ::

==We achieve the best ratio== when we buy on day 10, which gives a competitive ratio fo c= 1.9 >> we cant get lower than this

Note again the intuition :: We look at the worst case mostly. In the worst of all worlds, if you have no control whatsoever on whether you will go skiing ever again, with strategy you cannot loose more than a factor 1.9 of money. compared to the optimal strategy.

More generally ::

  • Online algorithms are usually used for buy-vs-rent problems :: Keep renting until what we have spent equals the cost of buying. After that buy.
  • This algorithm is 2-competitive - with a = 0 - by the same argument as above, for any sequence we occur at most twice as much costs as the optimal algorithm ==we are only observing the worst case analysis==

List update, caching and paging problem ::

  • we would like to maintain a list of items
  • Requests are to access, insert or delete an item from the list.
  • Exchange operation: requested item may be moved to somewhere else, or two items may be exchanged
  • Question: what is the best strategy to maintain the list such that all these operations are “as cheap as possible” in the sense of competitiveness.

if items are accessed often, we should retrieve them easier than the others

What might be a good strategy to maintain the list ?

_There are some approaches available :: _

  1. ==Move-to-front== always move the requested item to the lists front
  2. ==Transpose== Exchange the requested item with the item in front of it - and thus moving it one position further to the front
  3. ==Frequency count== for each itme in the list maintain a frequency count. Maintain the list in such a way that the frequency count is decreasing.

which one is the best ?

==Move-to-front== is the best solution, with 2-competitive

Transpose and frequency-count are not c-competitive for any constant c.


The marriage problem ::

Assume we’ve joined a dating site. Just from looking at profiles of potential partners, you identify n persons that you would like to date. Ideally, yo would like to “try” them all before picking one –> the best probably

How many ppl should we observer before deciding ?

This problem is also called the ==secretary problem==

informal definition ::

We are going to analyze the following strategy :

  1. order the people in some random order
  2. Phase1 : Go out with T people and ditch all of them – we call it ==calibration phase== and it is used to find an average >> late rwe search for all that are better than the first phase
  3. Phase 2 : Keep on going out with more people. As soon as you find someone who is better than all the candidates of phase 1 ::Take them!

in this version we never choose the people of the calibration phase again >> even if we have someone who’s the best in there.


formal definition ::

Assume the people are ordered in some random number. We denote the permuttation that re-orders people according to how you like them: i is better than j Goal is to find the candidate j with Our dating strategey is denoted with:: select the smallest j >t for which

We would like to compute the probability that , which is the probability for the best candidate.

The required calculations::

We define P as the optimal candidate:

Evaluating ::

Are permutations are equally likely to occure, because

Evaluating

We consider the condition that for the given j we have The our strategy chooses j if: j is the best candidate among is worse than any of the candidates in 1 to t- which is the calibration set

the best among 1,…,j-1 is contained in 1,…,t The minimum of above set is equally likely to occur at any of the j-1 positions. In particular, the likelihood that it occurs within the first t elements is

Result ::

We obtain the total equation of Further calculations - exploit properties of partial sums of harmonic series lead to ::

Result::

Our choosen dating strategy , defines the overall likelihood to obtain the best candidate with : Easy to see that this term is maximal for t = , which give a probability of

This solution aint the best and there are several extensions available to improve the algorithm ::

  • reduce the search for the top10 best people
  • we require to know n prior to calculate everything