0

Merge Sort

Concept:

Divide and conquer:

Let the merge sort recursive function be F. Call F for left half subarray and right half sorted array.

Now merge the two sorted array into one complete sorted array.
Merge takes O(n) and the depth of the recursive tree is Log(n).
Complete complexity: N * Log(n).



def merge_sort(arr):

	_merge_sort(arr, 0, len(arr) - 1)
	return arr 

def _merge_sort(arr, left_index, right_index):

	if left_index < right_index:
		mid_index = (left_index + right_index) / 2

		_merge_sort(arr, left_index, mid_index)
		_merge_sort(arr, mid_index + 1, right_index)

		merge_two_sorted_arrays(arr, left_index, mid_index, mid_index + 1, right_index)

		return arr 


def merge_two_sorted_arrays(arr, left_start_index, left_end_index, right_start_index, right_end_index):

	result = []
	i = left_start_index
	j = right_start_index

	while i < len(left_end_index) and j < len(right_end_index):
		if arr[i] < arr[j]:
			result.append(arr[i])
			i += 1
		else:
			result.append(arr[j])
			j += 1

	result.extend(arr[i:left_end_index + 1])
	result.extend(arr[j:right_end_index + 1])

	arr = result





Advertisements
0

Remove duplicates from LinkedList

Question:

Given a sorted linked list, delete all duplicates such that each element appear only once.

For example,
Given 1->1->2, return 1->2.
Given 1->1->2->3->3, return 1->2->3.

Concept:

Keep track of all elements that have already been visited (using set) and maintain previous node pointer while moving forward in the list.
Only if the current node value is new, set prev_node.next = current_node.(And update prev node, current_node).

Otherwise don’t connect prev node to the current node (and update the current_node = current_node.next).

And in the end, set prev_node.next = None.

'''
Given a sorted linked list, delete all duplicates such that each element appear only once.

For example,
Given 1->1->2, return 1->2.
Given 1->1->2->3->3, return 1->2->3.
'''


def remove_dups(head):
	my_set = set()
	prev = None
	while head:
		if head.val not in my_set:
			if prev != None:
				prev.next = head
			my_set.add(head.val)
			prev = head
		head = head.next
	prev.next = None

import scratch_pad

a = scratch_pad.a 
remove_dups(a)
scratch_pad.print_list(a)


But we didn’t use the property that the list is already sorted. Can we use this information to optimize our code further?

Yes, instead of using set(), we can use a variable and keep moving forward untill a different value is reached and then connect both nodes.

Here is the improved code:

def remove_dups_without_space(head):
	prev = None 

	current_node = head
	while current_node:
		if prev != None:
			while current_node != None and prev.val == current_node.val:
				current_node = current_node.next

		if prev:
			prev.next = current_node
		if current_node:
			prev = current_node
			current_node = current_node.next

0

LinkedList Palindrome

Since it’s a singly linked list(no `prev`) we can’t use two pointers to check for palindrome in single pass.

We can however, if we break the list from the middle, reverse the second half and then compare each element of the two halves. But we must be careful with odd-even length.

We need following functionality:
1. Divide from the middle
1. Reverse half
2. Compare the two halves.

To get the middle element we can use the tortoise and hare approach.

def get_middle(head):
	slow = head 
	fast = head
	slow_prev = None

	while fast != None and fast.next != None:
		prev = slow
		slow = slow.next
		fast = fast.next.next

	return slow,prev

This will return the middle element. In case of the odd list, the second half will get an extra element. We don’t have to worry about it, as on reversal it will get to the last. (We’ll make sure we don’t do anything with the last element if their are odd nodes).

Here is a simple reversal method that takes middle node.

def reverse(current_node):
	prev = None

	while current_node:
		next_node = current_node.next
		current_node.next = prev
		prev = current_node
		current_node = next_node

	return prev 

But we have to “break” the first half from the second half. Thats why our get_middle returned the prev node too, so we can prev.next = None.

Now we have two pointers to the head of each halves and compare one by one.

 



def get_middle(head):
	slow = head 
	fast = head
	slow_prev = None

	while fast != None and fast.next != None:
		prev = slow
		slow = slow.next
		fast = fast.next.next

	return slow,prev


def reverse(current_node):
	prev = None

	while current_node:
		next_node = current_node.next
		current_node.next = prev
		prev = current_node
		current_node = next_node

	return prev 


def is_list_palin(head):

	middle_node, middle_prev = get_middle(head)

	new_middle_node = reverse(middle_node)
	middle_prev.next = None 

	A = head
	B = new_middle_node

	while A and B:
		if A.val != B.val:
			return False
		A = A.next
		B = B.next 

	return True


import scratch_pad

print is_list_palin(scratch_pad.a)
0

Minimum window substring:

Found this question on Leetcode.

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

For example,
S = "ADOBECODEBANC"
T = "ABC"
Minimum window is "BANC".

The challange is that we have to do it in O(n) time.

Concept:
Create a window[i -> start substring index, j -> end substring index]

Keep increasing the right side of the window untill all charactes of T are satisified in the window S[i:j].

Now start closing the window from the left side (untill the condition is true that all charactes of T are still in S[i:j] window.

Since there are only 26 characters, it taks O(1) time to check that (via hash-map).
Here is the python implementation of this problem.

 


'''
Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

For example,
S = "ADOBECODEBANC"
T = "ABC"
Minimum window is "BANC".
'''


def min_window_substr(S,T):
	substr_char_map = fill_char_map(T)

	char_map = [0]*26
	i = 0
	j = 0
	max_window_length = float('inf')

	for each_char in S:

		char_map[ord(each_char) - ord('A')] += 1

		if covers_all_chars(char_map, substr_char_map):
			while covers_all_chars(char_map, substr_char_map):
				char_map[ord(S[i]) - ord('A')] -= 1
				i += 1

			i -= 1
			char_map[ord(S[i]) - ord('A')] += 1
			max_window_length = min(max_window_length, (j-i+1))

		j += 1

	return max_window_length

def fill_char_map(T):
	char_map = [0] * 26
	for each_char in T:
		char_map[ord(each_char) - ord('A')] += 1
	return char_map

def covers_all_chars(map1, map2):
	for i in range(26):
		if map2[i] >0 and map2[i] > map1[i]:
			return False
	return True

S = "ADOBECODEBANC"
T = "ABC"

print min_window_substr(S,T)

0

Max-contigious subarray

Max sum of a contigious subarray is the max sum of all the possible sub arrays.

This method however is very time consuming. Eg: It takes O(N^2) to just compute all the subarray and then takes another O(n) to compute sum of all. Overall time complexity is O(N^3) which is disastrous.

We can however improve it by divide and conquer.

The max sub contigous array sum is going to be max(left_half, max_subarray_sum_including_mid, right_half)

left_half is simply the answer of left half of the array. Same for right_half.

The max subarray which definitly included the arr[mid] is easy to calculate.

Now we know the answer to the left half, right half and the subarray which included the mid element. The overall result will be max of all three.

Here is the implementation:


def max_contiguous_array(arr, start_index, end_index):
	if start_index == end_index:
		return arr[start_index]
	elif start_index > end_index:
		return float('-inf')
	else:
		mid = (start_index + end_index) /2 

		left_subarray_max_sum = max_contiguous_array(arr, start_index, mid - 1)
		right_subarray_max_sum = max_contiguous_array(arr, mid + 1,end_index)

		current_sum = current_subarray_sum(arr, start_index, mid, end_index)

		return max(left_subarray_max_sum, right_subarray_max_sum, current_sum)


def current_subarray_sum(arr, left_index, mid, right_index):

	i = mid 
	j = mid + 1
	current_sum = 0
	left_max_sum = float('-inf')
	right_max_sum = float('-inf')
	while i >= left_index:
		current_sum += arr[i]

		left_max_sum = max(left_max_sum, current_sum)
		i -= 1
	current_sum = 0
	while j <= right_index:
		current_sum += arr[j]

		right_max_sum = max(right_max_sum, current_sum)
		j += 1

	return left_max_sum + right_max_sum 


arr = [-2,1,-3,4,-1,2,1,-5,4]

print max_contiguous_array(arr, 0, len(arr) -1)


Each function f(n) does O(n) work.
So, lets add all the complexity in all function calls:

O(n) + O(2 * n/2) + O(3 * n/3) ………..

= [O(n) + O(n) + O(n)……………..]

Total number of function calls are : Log(n).
So the total runtime complexity : n * log(n).

0

LRU Cache

The least recently used cache is a caching mechanism which flushes the data from the cache which was least used (when the cache is full) to make up the space for new data to be pushed in the cache.

It’s used by memcache, page file replacement algorithms.. etc.

Concept:
we can use a deque. We insert a new data to the front of this deque. This will take O(1) time.

If the deque reaches max capacity, we can delete the last node (rear) from the deque and insert the new one in the front. Again this takes O(1) time.

For retrieving value we’ll have to search the whole deque for the key. Which takes O(n) time. If we maintian a hash-map with key as hash-map key and value as deque node, we can retrive node in O(1) time.
This also helps in deleting *any node* with a particular key in O(1) time.

For retrieving , we need to remove the node from the deque and then add it to the front. Both operations will take O(1) time.

Here is the implementation:


class Node:
	def __init__(self, key, value):
		self.value = value
		self.next = None
		self.prev = None
		self.key = key

class LRU:
	def __init__(self, capacity = 10):
		self.capacity = capacity
		self.cache = {}
		self.front = None
		self.rear = None

	def set(self, key, value):
		new_node = Node(key, value)

		if len(self.cache) <  self.capacity:
			self._add_front(new_node)
			self.cache[key] = new_node 

		else:
			last_key = self.delete_last()
			del self.cache[last_key]
			self._add_front(new_node)
			self.cache[key] = new_node

	def get(self, key):
		if key in self.cache:

			self._move_to_front(key)
			return self.cache[key].value
		else:
			return -1 

	def _add_front(self, new_node):
		if self.front == None:
			self.front = new_node
			self.rear = new_node
		else:
			self.front.prev = new_node
			new_node.next = self.front
			self.front= new_node

	def _move_to_front(self, key):
		current_node = self.cache[key]
		self._delete_node(current_node)
		self._add_front(current_node)

	def _delete_node(self, current_node):
		if self.front == current_node:
			self.front = None
			self.rear = None
		else:
			current_node.prev.next = current_node.next
			if current_node.next:
				current_node.next.prev = current_node.prev

	def delete_last(self):
		last_node_key = self.rear.key
		self.rear.prev.next = None
		self.rear = self.rear.prev
		return last_node_key

my_lru = LRU(capacity = 3)
my_lru.set(1,'a')
my_lru.set(2,'b')
my_lru.set(3,'c')
my_lru.set(4,'d')
my_lru.set(5,'e')
my_lru.set(6,'f')
print my_lru.get(4)

0

Running Median

 

Median is the middle element in the array (in sorted form).

The challange here is that we need to return the median in O(1) time.

Concept:

  1. We can maintain a sortted array.?  At each step sorting costs us nlog(n).
  2. We maintain 2 buckets of equal sizes. Left median contains all values less than a number k and the right median contains all values greater than k.

In 2nd method the insertion will be complex (and not O(1)), but we can easily compute the median in O(1).

To distribute numbers in 2 buckets, we can use 2 heaps. Max_heap (left bucket) and Min_heap (right bucket).

If the current number is lesse than max in max_heap_left, we can insert in max_heap_left, other wise we can insert it in min_heap_right.

This may create uneven sizes of heap, so we need to check in each step if the sizes are uneven, if yes… we need to fix it.

 

Here is the implementation in Python.
Note: Max heap is not supported in Python, so I used min_heap and inserted (and popped elements) by multiying the number by -1.

 

 


import heapq

def running_median(arr):

	max_heap_left = []
	min_heap_right = []

	for each in arr:

		if len(max_heap_left) == 0 or each  len(min_heap_right):
				popped_number = max_heap_left[0] * -1
				heapq.heappop(max_heap_left)
				heapq.heappush(min_heap_right, popped_number )
			else:
				popped_number = min_heap_right[0] 
				heapq.heappop(min_heap_right)
				heapq.heappush(max_heap_left, popped_number * -1)

	if len(max_heap_left) == len(min_heap_right):
		return (max_heap_left[0] * -1 + min_heap_right[0])/2.0
	else:
		if len(max_heap_left) > len(min_heap_right):
			return max_heap_left[0] * -1
		else:
			return min_heap_right[0]	

arr  = [1,2,3,4,5]

print running_median(arr)	
arr = [1,2,3,4,5,6]
print running_median(arr)