Finding the median ::

<< Mittlerer Datenpunkt aus einem Sortierten System>>

The median element of a sorted sequence of integers a1,….,an is the element with index [n/2]: the element in the middle. The median is important in many applications: ==it is interpreted as the typical== element of the list. Note the difference between median and mean ::

1,1,1,1,1,,1,1….100 times = 1000 hier ist Mittlewert falsch bzw nicht representativ.

Median ist 1 Mittlewert ist ~11 und somit nicht repräsentativ für dieses Beispiel.

Bei diversen Statistiken ist es womöglich nicht sinnvoll den Mittlewert zu nehmen, wenn ein Großteil der Elemente ein und das Selbe Element darstellt. Angenommen man hätte N StudentInnen mit einem Gehalt von 700-1000 und einen Prof mit 3k Gehalt im Raum. Durch die eine Person steigt der Mittlewert sehr stark an und ist nicht mehr representativ.

Median and selection problem ::

Given an unordered sequence of n elements, we want to find the median element.

==Generalization of the given problem==:: Given an unordered sequence of n elements, we want to find the k-th smallest element.

Solving with divide and conquer ::

Algorithm ::

  • At each step in time, select a “splitter element” s and split the input array A in three pieces A_, A+ and A= ::
    • A_ containing everything below splitting element
    • A+ containing everything above splitting element
    • A= containing everything equal to splitting element
  • Then, depending on the sizes of A_, A= A+ we know in which of the two parts A_ and A+ we have to continue our search::

:: Example of idea :: ![[Pasted image 20221128152551.png]]

Select(A,k)
	Choose a splitter elmenet s - according to prespecified rule, see later ) 
	for all a in A : 
		if a < s : put a in A_, end 
		if a = s : put a in A= , end 
		if a > s : put a in A+ , end 
	if |A_| < k <= |A_| +|A=| : return s, end 
	if |A_| > k : return Select(A_,k) # continue searching in the subsets where we may find the median next 
	if(|A+| + |A=| < k) : return Select(A+, k-|A_| - |A=|) end 

456363897 Splitter element 3 :: A_ = [] A+ = [ 4,5,6,7,8,9,7] A= [ 3,3]

we have to take the third case ::

Select(A+, 3) as splitter element 6 :: A_ = [4,3,5 ] A+ = [9,8,7] A= = [6,6] ==we have discovered the median of 6==

worst splitterelement ::

  • assume we want to find the median of the sequence
  • Bue we have a stupid rule for selecting the splitter, it always chooses the maximum element in the sequence.
  • Then in each step the size of A_ decreases by 1. IN each round we need to compare against all elements in A_. We need to do so until A_ has size about n/2, then we have found the median.
  • $(n-1)+(n-2)+\ldots n/2= \teta (n^{2})$

Avoiding this worst case and implement a better strategy ?

bestes Splitterelement bestimmen ::

Wie bestimmt man das beste Splitterlement für den Algorithmus ?

  • take a random entry and hope for it to work well > fast because no computation
  • take the mean - mittlewert - because its close to median
  • choose a splitter element that results with |A_| ~ |A+| for easily continuing afterwards.

Solving Search for best splitter element ::

[[111.99_algo_randomized_algorithms]]