1
00:00:00,110 --> 00:00:01,130
Welcome back.

2
00:00:01,130 --> 00:00:06,080
Let's now discuss how to solve the edit distance question.

3
00:00:06,080 --> 00:00:13,730
Now we will not directly start with the bottom up or tabulation approach, but rather first we will

4
00:00:13,730 --> 00:00:16,190
write the recursive solution for this question.

5
00:00:16,190 --> 00:00:20,450
Then we will memorize it by just adding a few lines of code.

6
00:00:20,450 --> 00:00:26,480
And once we are done with these two steps, we will proceed and take a look at the tabulation or bottom

7
00:00:26,480 --> 00:00:31,910
up approach for solving the edit distance question, and then come up with the space optimized bottom

8
00:00:31,910 --> 00:00:32,660
up approach.

9
00:00:32,660 --> 00:00:36,950
So let's get started with the recursive approach for solving this question.

10
00:00:36,950 --> 00:00:43,820
Now, because we are thinking of a recursive approach, we will have to think of the base case and the

11
00:00:43,820 --> 00:00:49,400
recursive case, because we'll be writing a recursive function and we will be recursively calling it

12
00:00:49,400 --> 00:00:50,900
inside the body of the function.

13
00:00:50,900 --> 00:00:59,720
So first let's think of what choices are available when we try to convert word one to word two.

14
00:00:59,750 --> 00:01:03,800
Now now for this let's make use of a choice diagram.

15
00:01:03,800 --> 00:01:06,470
And let's consider two possibilities.

16
00:01:06,470 --> 00:01:11,150
So let's say in one case we are going from zero to n.

17
00:01:11,150 --> 00:01:18,230
Remember whenever we write recursive solutions you can either go from zero to n or n to zero.

18
00:01:18,380 --> 00:01:20,660
So you can do it in both ways.

19
00:01:20,660 --> 00:01:23,840
Over here let's think of the zero to n approach.

20
00:01:23,840 --> 00:01:30,650
So let's say you're comparing the zeroth character in A, B, C and a PC.

21
00:01:30,680 --> 00:01:33,650
So this is word one and this is word two.

22
00:01:33,650 --> 00:01:36,830
And you're trying to convert this word to this word over here.

23
00:01:36,830 --> 00:01:43,760
And let's say we are comparing the characters at index zero in these two strings over here.

24
00:01:43,760 --> 00:01:45,920
So we see that they are equal.

25
00:01:46,600 --> 00:01:54,280
And if the characters are equal, then we do not need to do any operations to convert this to this because

26
00:01:54,280 --> 00:02:01,240
they are already equal and therefore we can just move ahead and the remaining subproblem becomes this

27
00:02:01,240 --> 00:02:03,610
part over here and this part over here.

28
00:02:03,640 --> 00:02:09,040
So this is one case where we see that the characters that we are comparing are equal.

29
00:02:09,190 --> 00:02:15,910
Now on the other side, if the characters that we are comparing are not equal, what would be the choices

30
00:02:15,910 --> 00:02:16,810
that we have.

31
00:02:16,840 --> 00:02:22,180
So over here again we are comparing the characters at the indices zero and zero.

32
00:02:22,180 --> 00:02:25,840
And we see that we have T over here and we have B over here.

33
00:02:25,840 --> 00:02:27,580
And they are not equal.

34
00:02:27,610 --> 00:02:34,420
Now the question says that we have three possible operations that can be done.

35
00:02:34,420 --> 00:02:41,140
So the possible operations are insertion deletion and replacing characters.

36
00:02:41,170 --> 00:02:46,000
So let's go ahead and write these two strings for these three scenarios.

37
00:02:46,000 --> 00:02:52,600
To analyze what would be happening when we do these three operations respectively.

38
00:02:52,630 --> 00:02:54,760
So over here we have insertion.

39
00:02:54,760 --> 00:02:56,650
And again what is the case.

40
00:02:56,650 --> 00:02:58,180
We are comparing T and B.

41
00:02:58,180 --> 00:03:01,030
And we have seen that these two are not equal.

42
00:03:01,030 --> 00:03:03,310
And we are going from zero to n.

43
00:03:03,880 --> 00:03:11,740
Now if this is the case inserting a character would mean adding a b over here so that this b and this

44
00:03:11,770 --> 00:03:13,300
b are now equal.

45
00:03:13,330 --> 00:03:17,680
Now notice that initially the pointers were over here.

46
00:03:18,650 --> 00:03:19,910
And over here.

47
00:03:19,910 --> 00:03:22,850
And we saw that T and B are not equal.

48
00:03:22,850 --> 00:03:25,280
And we inserted a b over here.

49
00:03:25,280 --> 00:03:30,020
And therefore this pointer can still stay over here itself.

50
00:03:30,020 --> 00:03:34,940
But because we know that these two are now equal we would have to move this ahead.

51
00:03:35,000 --> 00:03:38,810
So the pointers now would be over here and over here.

52
00:03:38,810 --> 00:03:46,130
So that means that the pointer which was used for iterating over the characters for word two would have

53
00:03:46,130 --> 00:03:51,230
to be moved in this case to identify the remaining subproblems.

54
00:03:51,710 --> 00:03:54,410
Now what about deletion again?

55
00:03:54,410 --> 00:03:55,610
What's the scenario?

56
00:03:55,610 --> 00:03:59,090
We are comparing the characters over here and over here.

57
00:03:59,650 --> 00:04:02,260
And we see that these two are not equal.

58
00:04:02,260 --> 00:04:05,380
And we want to convert word one to word two.

59
00:04:05,410 --> 00:04:10,780
Now, because these two are not equal, deletion would mean that we are deleting this character over

60
00:04:10,780 --> 00:04:11,230
here.

61
00:04:11,320 --> 00:04:13,060
So this would be deleted.

62
00:04:13,060 --> 00:04:20,620
And because this is deleted, we would have to move this pointer ahead to come to the next character

63
00:04:20,620 --> 00:04:21,340
over here.

64
00:04:21,340 --> 00:04:27,730
So notice that we would have to move the pointer used to iterate over the characters of word one.

65
00:04:27,730 --> 00:04:31,780
And this pointer stays where it was initially itself.

66
00:04:31,780 --> 00:04:38,440
So this is how we can identify the subproblem when we do the deletion operation.

67
00:04:38,860 --> 00:04:41,380
Now what about replacing characters?

68
00:04:41,380 --> 00:04:45,790
Again the scenario is that we are comparing these two characters.

69
00:04:45,790 --> 00:04:50,530
We see that they are not equal and we want to convert word one to word two.

70
00:04:50,530 --> 00:04:57,040
Now, replacing this character would mean that instead of T over here we would write B.

71
00:04:58,140 --> 00:05:04,470
Now we know that these two are equal, and because these two are equal, we would have to move both

72
00:05:04,470 --> 00:05:05,220
the pointers.

73
00:05:05,220 --> 00:05:11,130
Because initially this pointer was pointing over here and this pointer was pointing to this character.

74
00:05:11,160 --> 00:05:13,350
Now we know that these two are equal.

75
00:05:13,350 --> 00:05:20,910
And therefore we would have to move both the pointers or both the variables used to iterate over the

76
00:05:20,910 --> 00:05:23,400
indices of word one and word two.

77
00:05:23,430 --> 00:05:32,850
So this over here helps us understand how to identify the subproblem in case of insertion, deletion

78
00:05:32,850 --> 00:05:35,280
and replacing characters.

79
00:05:35,310 --> 00:05:42,390
So to solve this question, we would just have to write a recursive function which takes two values

80
00:05:42,390 --> 00:05:43,380
as inputs.

81
00:05:43,380 --> 00:05:50,070
And we will use those two values as pointers to characters in word one and word two.

82
00:05:50,070 --> 00:05:51,990
And then we will compare characters.

83
00:05:51,990 --> 00:05:57,900
If we see that they are equal, then we would just proceed with the remaining subproblem.

84
00:05:57,900 --> 00:06:04,980
And if we see that they are not equal, we would calculate three versions of making word one to word

85
00:06:05,010 --> 00:06:05,370
two.

86
00:06:05,400 --> 00:06:09,030
So we would say insertion is equal to one plus.

87
00:06:09,030 --> 00:06:11,760
And then we would recursively call the same function.

88
00:06:11,760 --> 00:06:16,530
And we would pass the indices as we have discussed over here to the function.

89
00:06:16,530 --> 00:06:19,680
And we would say deletion is equal to one plus.

90
00:06:19,680 --> 00:06:21,900
Again we would call the recursive function again.

91
00:06:21,900 --> 00:06:24,630
And we would pass these indices over here.

92
00:06:24,630 --> 00:06:28,050
And we would say replace is equal to one plus.

93
00:06:28,050 --> 00:06:34,440
And then again we would recursively call the same function and increment both the pointers as we have

94
00:06:34,440 --> 00:06:35,400
discussed over here.

95
00:06:35,400 --> 00:06:39,240
And then these three would give us three respective values.

96
00:06:39,240 --> 00:06:45,810
And the answer would be the minimum of these three, because the question asks us to identify the minimum

97
00:06:45,810 --> 00:06:50,400
number of operations required to change word one to word two.

98
00:06:50,430 --> 00:06:57,270
So we have discussed the recursive case for the recursive solution for the edit distance question.

99
00:06:57,270 --> 00:07:03,750
And now let's also proceed and discuss the base condition for this approach.

100
00:07:03,750 --> 00:07:09,120
Now to understand and discuss the base condition, let's take three examples.

101
00:07:09,120 --> 00:07:15,030
And again we could write the solution either going from zero to n or n to zero.

102
00:07:15,030 --> 00:07:18,780
And the base case will vary based on which approach we take.

103
00:07:18,780 --> 00:07:22,230
But over here for our discussion we are going from zero to n.

104
00:07:22,230 --> 00:07:25,110
So let's think of these three examples.

105
00:07:25,110 --> 00:07:34,260
So let's say in one case we are given a, b, C and Pqr the second example is a b c and p q r s t,

106
00:07:34,950 --> 00:07:39,840
and the third example is a b c d e f and p q r.

107
00:07:39,930 --> 00:07:47,490
Now in this example, let's say that we have reached over here for word one, and we have reached over

108
00:07:47,490 --> 00:07:52,740
here for word two because we were using two pointers to iterate over these characters.

109
00:07:52,740 --> 00:07:58,800
So again this scenario indicates that there's nothing left in word one and nothing left in word two.

110
00:07:58,800 --> 00:08:04,590
And if this is the base case that we encounter, then we can just return zero.

111
00:08:04,590 --> 00:08:10,350
Because to change an empty string to an empty string, we don't need any operations.

112
00:08:10,590 --> 00:08:12,630
Now, what about this case over here?

113
00:08:12,630 --> 00:08:19,530
This would mean that we have reached over here with the pointer that was used to iterate over the characters

114
00:08:19,530 --> 00:08:25,830
for word one, and we have reached over here with the pointer that was used to iterate over the characters

115
00:08:25,830 --> 00:08:26,940
for word two.

116
00:08:26,970 --> 00:08:33,810
So this subproblem over here would mean that we would need to convert an empty string to s t.

117
00:08:33,930 --> 00:08:40,680
So for this we would need two insert operations, or we would need to insert s and t.

118
00:08:40,680 --> 00:08:45,750
And therefore if this is the base case that we encounter, we should return two.

119
00:08:45,930 --> 00:08:48,060
Now what about this case over here.

120
00:08:48,060 --> 00:08:54,450
This case over here would indicate that the pointer which was used to iterate over the characters for

121
00:08:54,450 --> 00:08:56,250
word one is over here.

122
00:08:56,250 --> 00:09:01,020
And the pointer, which was used to iterate over the characters for word two is over here.

123
00:09:01,020 --> 00:09:09,060
So the subproblem at hand is to convert d, e, f to an empty string, and for this we would need three

124
00:09:09,060 --> 00:09:12,000
operations or three delete operations.

125
00:09:12,000 --> 00:09:14,880
So in this case we would have to return three.

126
00:09:14,880 --> 00:09:21,600
So these are the three possible base cases that we could encounter based on the length of the input

127
00:09:21,600 --> 00:09:23,760
strings word one and word two.

128
00:09:23,760 --> 00:09:31,200
So over here if both the pointers become out of bound or greater than the last valid indices respectively,

129
00:09:31,200 --> 00:09:32,460
we can return zero.

130
00:09:32,460 --> 00:09:39,000
And if only one of them becomes greater than the last valid index and there are still characters left

131
00:09:39,000 --> 00:09:45,270
in the other word, then we would have to return the number of characters that are left in the other

132
00:09:45,270 --> 00:09:45,510
word.

133
00:09:45,510 --> 00:09:50,940
So notice over here we are returning two and over here we are returning three.
