1
00:00:06,200 --> 00:00:08,780
Hey everybody what's going on this is Caleb with Debb slopes.

2
00:00:08,780 --> 00:00:14,450
And in this video we're going to take what we just built in the previous video and we're going to set

3
00:00:14,450 --> 00:00:20,090
it up so that we can actually perform our facial recognition requests.

4
00:00:20,090 --> 00:00:27,040
So let's go back into our view controller here and I'm going to go ahead and import the vision framework.

5
00:00:27,170 --> 00:00:30,200
OK vision is the framework in question.

6
00:00:30,230 --> 00:00:36,570
That is the one we are going to use which is very very powerful and robust there's a lot it can do.

7
00:00:36,590 --> 00:00:42,470
We're barely scratching the surface with this tutorial but in order to actually begin this we need to

8
00:00:42,470 --> 00:00:50,010
start by creating a function K and that function is going to be called perform vision request.

9
00:00:50,260 --> 00:00:51,040
OK.

10
00:00:52,000 --> 00:00:52,730
There we go.

11
00:00:52,970 --> 00:00:54,110
Lovely.

12
00:00:54,170 --> 00:01:00,680
And so what we're going to need is we're going to need to create what is called a V.N. detect face rectangles

13
00:01:00,680 --> 00:01:01,600
request here.

14
00:01:01,730 --> 00:01:08,840
That is a lot of words but it is basically a class that can analyze an image identify faces and return

15
00:01:08,840 --> 00:01:15,080
parameters for where it thinks those faces are and then we can use those parameters to generate shapes

16
00:01:15,680 --> 00:01:17,090
to place over the faces.

17
00:01:17,090 --> 00:01:19,670
So let's go ahead and create that request.

18
00:01:19,670 --> 00:01:27,190
Let face detection request equals V and detect.

19
00:01:27,350 --> 00:01:32,380
And like I said it's the end detect face rectangles request just like that.

20
00:01:32,420 --> 00:01:39,350
But if I push enter on this you can see that we get returned what is called a request and an optional

21
00:01:39,530 --> 00:01:39,830
error.

22
00:01:39,830 --> 00:01:41,250
Should there be a problem.

23
00:01:41,510 --> 00:01:45,140
And we're going to actually begin by handling that error if there is one.

24
00:01:45,140 --> 00:01:48,290
So if left error equals error.

25
00:01:48,740 --> 00:01:49,680
OK.

26
00:01:50,060 --> 00:01:55,490
If there happens to be an error and it is not nil we're going to set it to a constant called error and

27
00:01:55,490 --> 00:01:56,440
we'll print it out.

28
00:01:56,450 --> 00:02:05,350
So failed to detect face and then we're going to go ahead and print out that error.

29
00:02:05,540 --> 00:02:09,440
After that we're going to go ahead and return because if there's an error there's no need for this request

30
00:02:09,440 --> 00:02:10,820
to continue.

31
00:02:10,850 --> 00:02:15,830
Now assuming there is no air and that we are good to go we're going to go ahead and actually pull out

32
00:02:16,340 --> 00:02:21,980
the results of our request and for every result we're going to go ahead and cycle through and pull out

33
00:02:21,980 --> 00:02:25,100
a parameter called a V.N. face observation.

34
00:02:25,100 --> 00:02:29,840
So go ahead and type request dot results dot for each.

35
00:02:29,840 --> 00:02:31,280
We're going to use a For Each loop.

36
00:02:31,410 --> 00:02:32,090
All righty.

37
00:02:32,450 --> 00:02:40,700
And for every results result sorry singular for every result we're going to go ahead and create a face

38
00:02:40,700 --> 00:02:41,770
observation object.

39
00:02:41,810 --> 00:02:51,050
And it is optional so let's use guard light to be safe guard let's face observation equals results and

40
00:02:51,050 --> 00:02:54,860
we're going to cast it as V.N. face observation.

41
00:02:54,860 --> 00:02:57,670
Otherwise we will return.

42
00:02:57,680 --> 00:03:04,280
So now essentially I should back up essentially what we're going to do is eventually we're going to

43
00:03:04,280 --> 00:03:09,220
pass in this request into what's called a V.N. image request handler.

44
00:03:09,320 --> 00:03:10,000
OK.

45
00:03:10,250 --> 00:03:12,790
And we're basically going to pass in the image.

46
00:03:12,800 --> 00:03:19,010
We're going to pass in this request and it's going to process the image and if there are any faces that

47
00:03:19,010 --> 00:03:24,020
it finds based on how it's been trained it's going to return an observation of that face.

48
00:03:24,020 --> 00:03:26,660
And we're going to see what's actually inside this object.

49
00:03:26,660 --> 00:03:27,700
So we know what to do with it.

50
00:03:27,710 --> 00:03:32,540
So we're going to go ahead and print face observation.

51
00:03:32,740 --> 00:03:33,100
OK.

52
00:03:33,100 --> 00:03:38,810
And there's a parameter in here called bounding box and what it is is it's essentially the coordinates

53
00:03:38,810 --> 00:03:43,400
and size of a box of the detected object or the face.

54
00:03:43,400 --> 00:03:45,620
So I'm going to go ahead and print that out.

55
00:03:45,620 --> 00:03:52,370
However we have not yet used this parameter so it's not going to do anything yet but we are going to

56
00:03:52,370 --> 00:03:53,470
get there right now.

57
00:03:53,600 --> 00:03:59,330
So let's go ahead and create the handler where we're going to actually pass in this request.

58
00:03:59,330 --> 00:04:08,920
Let's go ahead and type let image requests handler equals V.N. image request handler like so.

59
00:04:09,140 --> 00:04:14,330
And we'll put a constructor here at the very end and we're going to choose the one that requires a CEG

60
00:04:14,330 --> 00:04:16,350
image and options.

61
00:04:16,610 --> 00:04:22,310
Now the options we don't actually have any for this so we're just going to use an empty dictionary.

62
00:04:22,850 --> 00:04:28,750
But we do need HEG image and we're going to have to pass that in at the very beginning so perform vision

63
00:04:28,760 --> 00:04:30,780
quest for image.

64
00:04:30,800 --> 00:04:33,970
And let's make that of type Siggi image.

65
00:04:34,010 --> 00:04:41,690
Now the issue is that we need to actually have a C.G. image but the image in our app is of tight UI

66
00:04:41,690 --> 00:04:42,380
image.

67
00:04:42,470 --> 00:04:49,010
So we're going to go ahead and create a CGI image like so and we can actually go ahead and just type

68
00:04:49,070 --> 00:04:53,500
image and we can go ahead and just pull out the C.G. image from within that.

69
00:04:53,510 --> 00:04:59,600
But it is optional because not every image is going to have an underlying C-g C-g image.

70
00:04:59,600 --> 00:05:01,010
It's hard to say.

71
00:05:01,100 --> 00:05:07,350
So what we're going to do is we're going to be safe and we're going to say else and we're going to print

72
00:05:07,410 --> 00:05:16,110
out an error in the instance that this doesn't work we'll say you could not find CGI image just like

73
00:05:16,110 --> 00:05:16,400
that.

74
00:05:16,410 --> 00:05:17,750
And then we will return.

75
00:05:17,850 --> 00:05:24,600
That'll tell us whether or not our CGI image is legit and in order to actually put this CGI image somewhere

76
00:05:24,600 --> 00:05:25,570
important.

77
00:05:25,680 --> 00:05:28,400
We're going to go ahead and pass it in right here.

78
00:05:28,410 --> 00:05:34,830
We'll just call our function perform vision request for C.G. image and we'll pass it in boom just like

79
00:05:34,830 --> 00:05:35,070
that.

80
00:05:35,070 --> 00:05:40,450
So now that we have sent in this C.G. image we can pass it in through our function here.

81
00:05:40,570 --> 00:05:43,440
It comes in as the property image.

82
00:05:43,800 --> 00:05:50,100
And what this request handler is going to do is we're going to be able to call a function called perform

83
00:05:50,550 --> 00:05:57,330
and we can pass in our request it'll do all of the work for the image and then return results to us

84
00:05:57,390 --> 00:06:02,710
and actually print out you know this bounding box it'll it'll do all of the work that we've asked it

85
00:06:02,730 --> 00:06:03,750
to do.

86
00:06:03,750 --> 00:06:12,690
Now if we go ahead and we try to call this image request handler does perform you'll see that it throws

87
00:06:12,780 --> 00:06:13,510
errors.

88
00:06:13,510 --> 00:06:15,220
Kate I'm sure you saw that.

89
00:06:15,240 --> 00:06:19,710
So we're going to need to actually put this inside of a do catch block to be safe.

90
00:06:19,710 --> 00:06:21,080
So go ahead and type do.

91
00:06:21,330 --> 00:06:26,940
And then at the bottom type catch and that's where errors are going to be thrown which is exactly what

92
00:06:26,940 --> 00:06:27,910
we want.

93
00:06:28,200 --> 00:06:30,840
And go ahead and cut this and paste it in.

94
00:06:30,850 --> 00:06:37,350
However we still need to call try at the beginning we need to tell it to try and if it can't then it

95
00:06:37,350 --> 00:06:40,110
will throw errors here in that instance.

96
00:06:40,140 --> 00:06:44,720
We're going to print something like a message telling what happened.

97
00:06:44,730 --> 00:06:52,380
Failed to perform image request and then we're going to actually print out the error itself but we'll

98
00:06:52,380 --> 00:06:55,740
get its localized description so that it's a little easier to deal with.

99
00:06:55,740 --> 00:07:00,570
So there is that and then we can go ahead and return.

100
00:07:00,570 --> 00:07:03,810
No need to go further if there is an error like so.

101
00:07:04,230 --> 00:07:10,800
And now assuming there's not an error and that we can in fact perform our request we can pass in our

102
00:07:10,800 --> 00:07:12,640
face detection request.

103
00:07:12,720 --> 00:07:15,160
However it is expecting an array.

104
00:07:15,300 --> 00:07:21,510
And what that means is that you could pass in 500 requests and what it would do is it would parse through

105
00:07:21,690 --> 00:07:27,870
each one in order and perform all of the requests that you ask whether it's scanning faces identifying

106
00:07:27,870 --> 00:07:32,380
barcodes or rectangles you know tracking things like that.

107
00:07:32,550 --> 00:07:34,250
Vision is so cool.

108
00:07:34,590 --> 00:07:35,280
OK.

109
00:07:35,460 --> 00:07:38,300
So we have now passed in our image.

110
00:07:38,460 --> 00:07:41,630
We have now performed our request.

111
00:07:41,700 --> 00:07:48,750
And with that in mind we should get an observation back and we should get a bounding box that will print

112
00:07:48,750 --> 00:07:49,030
out.

113
00:07:49,050 --> 00:07:54,210
So I don't know about you but I think it would be great to try this and just see what we get printing

114
00:07:54,210 --> 00:07:54,420
out.

115
00:07:54,440 --> 00:08:02,340
And let's let's go ahead and print a little tag here at the beginning bounding box and maybe a line

116
00:08:02,340 --> 00:08:09,420
break and that will help us to see where our stuff is so let's build and run this and our function should

117
00:08:09,420 --> 00:08:10,460
get called right away.

118
00:08:10,500 --> 00:08:13,340
When our app loads and then we should get a nice little print out.

119
00:08:13,380 --> 00:08:17,230
So it's loading doing its thing.

120
00:08:17,250 --> 00:08:20,310
Oh and we already got one that is really cool stuff.

121
00:08:20,310 --> 00:08:21,300
Check this out.

122
00:08:21,300 --> 00:08:23,970
We get what is called a bounding box.

123
00:08:24,210 --> 00:08:30,300
And just so you know if we go into the bounding box parameter you'll see that it's a C-g recked but

124
00:08:30,300 --> 00:08:31,770
we need to know what that actually means.

125
00:08:31,770 --> 00:08:38,580
So if we dive deeper into V.N. face observation and go find bounding box we can get a nice little description

126
00:08:38,850 --> 00:08:40,250
of what that actually means.

127
00:08:40,260 --> 00:08:41,060
And you know what.

128
00:08:41,150 --> 00:08:42,570
Maybe easier just to.

129
00:08:42,750 --> 00:08:43,470
Yes to do that.

130
00:08:43,470 --> 00:08:44,090
OK.

131
00:08:44,430 --> 00:08:47,900
It is the bounding box of the detected object.

132
00:08:47,910 --> 00:08:52,470
The coordinates are normalized to the dimensions of the processed image.

133
00:08:52,590 --> 00:08:59,580
Meaning they're not exactly you know the right coordinates they're in reference to the actual size of

134
00:08:59,580 --> 00:09:04,850
the image so we need to do some math to upgrade them to fit the image in the view.

135
00:09:05,220 --> 00:09:09,040
And the origin of those points is the image is lower left corner.

136
00:09:09,060 --> 00:09:17,110
So down here this is interesting because typically C-g wrecked rectangles or images or anything.

137
00:09:17,130 --> 00:09:21,540
The top left corner is usually the origin point so it's a little interesting we have to do some math

138
00:09:21,540 --> 00:09:23,160
to kind of flip things.

139
00:09:23,160 --> 00:09:26,190
But what you need to know is this is the x coordinate.

140
00:09:26,220 --> 00:09:29,720
This is the y coordinate for the origin point of the box.

141
00:09:29,820 --> 00:09:33,430
And then this is the length or I guess the width and the height.

142
00:09:33,630 --> 00:09:36,480
So it's showing you all four parameters.

143
00:09:36,480 --> 00:09:38,880
This is really great that's what we need.

144
00:09:39,060 --> 00:09:44,580
So let's go ahead and with these parameters head over to the next video where we're going to actually

145
00:09:44,640 --> 00:09:46,530
put this into something valuable.

146
00:09:46,590 --> 00:09:52,860
You know what this means though vision is doing the work for us it found a face and it's telling us

147
00:09:52,920 --> 00:09:58,950
exactly where it is in the next video we're going to do some math and drop a huge eye view on the screen

148
00:09:59,220 --> 00:10:05,460
to outline the face based on the values that we've received from vision so cool super easy actually.

149
00:10:05,480 --> 00:10:06,790
And let's head over there now.

150
00:10:06,830 --> 00:10:07,500
Let's do it.