1
00:00:00,150 --> 00:00:02,160
Hey guys, welcome to day 31

2
00:00:02,160 --> 00:00:07,080
of 100 Days of Code. And you've been learning quite a lot of things

3
00:00:07,080 --> 00:00:10,470
now. We've been looking at handling exceptions,

4
00:00:10,770 --> 00:00:15,120
using the JSON data format, passing and reading CSVs

5
00:00:15,120 --> 00:00:19,950
using pandas, opening and writing to files and a whole lot more.

6
00:00:20,310 --> 00:00:24,270
So it's time for your capstone project.

7
00:00:24,960 --> 00:00:26,490
And in this capstone project,

8
00:00:26,550 --> 00:00:30,210
we're going to be building a flashcard program to help you study.

9
00:00:30,810 --> 00:00:35,490
And it's especially great with studying for languages. Now,

10
00:00:35,550 --> 00:00:36,540
when I was in school,

11
00:00:36,570 --> 00:00:41,570
I studied French and it was a lot of vocab that we had to learn for tests.

12
00:00:42,780 --> 00:00:46,070
Pomme is Apple, and I would memorize all these words,

13
00:00:47,000 --> 00:00:50,900
all of the grammar tables, and yet after four or five years,

14
00:00:51,290 --> 00:00:53,150
I couldn't really speak much French.

15
00:00:53,810 --> 00:00:58,100
So I decided to go for immersive language learning.

16
00:00:58,220 --> 00:00:59,390
I went to France,

17
00:00:59,810 --> 00:01:04,810
I hung out with friends and I tried to immerse myself in the language and

18
00:01:06,500 --> 00:01:10,520
culture. But that also failed. I had a lot of fun,

19
00:01:10,520 --> 00:01:13,670
but my French didn't seem to improve all that much.

20
00:01:14,420 --> 00:01:19,420
But then I discovered a new way of learning languages and it all started with

21
00:01:20,030 --> 00:01:24,710
looking at Chinese characters. There's a lot of Chinese characters out there.

22
00:01:24,710 --> 00:01:29,710
There's something like 50,000 Chinese characters in total from history to now.

23
00:01:31,790 --> 00:01:34,940
There's a lot of characters that you could learn. They each have a different

24
00:01:34,940 --> 00:01:37,670
meaning and they each have a different pronunciation.

25
00:01:38,450 --> 00:01:43,250
Imagine that. Trying to learn 50,000 characters. That's no easy feat.

26
00:01:43,490 --> 00:01:48,350
But then a friend told me that actually you don't really need 50,000

27
00:01:48,380 --> 00:01:52,850
characters. Your average professor, who is very eloquent,

28
00:01:53,180 --> 00:01:56,150
can write a lot of the characters and use them with ease,

29
00:01:56,510 --> 00:02:01,510
only knows about 10,000 and your average person probably only uses about 8,000

30
00:02:03,470 --> 00:02:08,240
in their day to day lives. And if you basically just want to get by in life,

31
00:02:08,300 --> 00:02:12,950
you can pretty much rely on the 3000 words that an average teenager would know.

32
00:02:13,880 --> 00:02:18,020
And finally, if you actually just want to be able to watch some simple movies,

33
00:02:18,050 --> 00:02:19,520
read some simple books,

34
00:02:19,760 --> 00:02:24,320
then you could use the average kid vocabulary of about that the 1000 characters.

35
00:02:24,920 --> 00:02:28,700
And at this point, I think to myself, 1000. That's quite doable.

36
00:02:28,970 --> 00:02:32,330
I could do 1000. If I learn just 10 characters a day,

37
00:02:32,720 --> 00:02:35,720
that will take me less than a year to learn all of these characters.

38
00:02:36,530 --> 00:02:39,980
But it's not just 1000 random characters either.

39
00:02:40,550 --> 00:02:42,920
There's such a thing as a frequency dictionary.

40
00:02:42,950 --> 00:02:46,310
So a dictionary that's not listed here by A, B, C, D,

41
00:02:46,640 --> 00:02:51,640
but it actually listed by the frequency that a particular word occurs in common

42
00:02:51,740 --> 00:02:54,170
usage. For example,

43
00:02:54,410 --> 00:02:59,410
if you take the first 1000 characters that are most commonly used,

44
00:03:00,280 --> 00:03:03,160
then you can pretty much read most of the Newspapers,

45
00:03:03,160 --> 00:03:06,160
you can watch most of the TV shows because these

46
00:03:06,160 --> 00:03:10,870
are the words that are the bread and butter of the language. It's like in English,

47
00:03:11,110 --> 00:03:14,530
the a, the, of, from, why, yes, no,

48
00:03:14,740 --> 00:03:17,920
these are words that we use every day, again and again.

49
00:03:18,430 --> 00:03:23,430
The crazy words like anti-establishment or glioblastoma.

50
00:03:24,640 --> 00:03:28,870
These are not words that you need to really know for day to day life.

51
00:03:29,770 --> 00:03:32,080
So let me show you the program that you'll build

52
00:03:32,110 --> 00:03:35,530
where you can learn the most frequently used words in any language.

53
00:03:36,130 --> 00:03:38,080
It's a flashcard program,

54
00:03:38,620 --> 00:03:43,360
and it shows you the front and the back of the card. So for example,

55
00:03:43,390 --> 00:03:48,390
French, demande. In English means request. After three seconds,

56
00:03:48,850 --> 00:03:52,990
the card flips and I can check whether if I knew the right answer.

57
00:03:53,500 --> 00:03:57,070
If I got it right I'll press the tick and if I got it wrong

58
00:03:57,100 --> 00:04:00,370
I'll press the cross. So let's try another word.

59
00:04:00,700 --> 00:04:04,840
Parti means left or to leave. Attendez

60
00:04:04,840 --> 00:04:07,360
means to wait

61
00:04:07,960 --> 00:04:12,490
and I think I knew that word. So I'm going to click the check mark.

62
00:04:12,880 --> 00:04:16,750
And what that's going to do is it's going to take the flashcard out of all of

63
00:04:16,750 --> 00:04:20,589
the list of flashcards so it doesn't show me the things I already know.

64
00:04:21,339 --> 00:04:24,070
And instead it only shows me the things I don't know

65
00:04:24,280 --> 00:04:28,840
so I can review it and say, Oh, I'm not sure what loin means

66
00:04:28,900 --> 00:04:30,490
so I'll say cross,

67
00:04:30,790 --> 00:04:35,380
and that will go back into the deck and it might come up again at some point.

68
00:04:36,100 --> 00:04:40,480
So this beautiful piece of software is what we're going to be creating.

69
00:04:40,960 --> 00:04:42,190
But more specifically,

70
00:04:42,220 --> 00:04:46,930
you're going to be creating because after all, this is your capstone project.

71
00:04:47,380 --> 00:04:48,070
But don't worry,

72
00:04:48,070 --> 00:04:53,070
I've divided up into four steps and I've got some step by step instructions for

73
00:04:53,140 --> 00:04:57,400
you in the next lesson. Now, if you're wondering,

74
00:04:57,520 --> 00:05:00,970
how did you get the most frequent words for the flashcard app in the first

75
00:05:00,970 --> 00:05:02,890
place? Well, let me show you.

76
00:05:03,520 --> 00:05:08,520
There's a Wiki for the frequency list of different languages,

77
00:05:09,010 --> 00:05:12,790
and it lists most of the common languages. If we go to French,

78
00:05:12,820 --> 00:05:17,320
you can see that there are loads of different lists that people have compiled

79
00:05:17,530 --> 00:05:21,040
that list the top, most frequently occurring words.

80
00:05:21,700 --> 00:05:26,700
And one of the ones that I thought was really relevant is the words are based on

81
00:05:26,710 --> 00:05:27,610
subtitles.

82
00:05:28,120 --> 00:05:33,120
These subtitles come from all sorts of shows and movies that are relevant to modern

83
00:05:33,730 --> 00:05:37,360
culture. And when you look at one of the subtitles,

84
00:05:37,660 --> 00:05:39,370
this is one of my favorite shows by the way,

85
00:05:39,700 --> 00:05:43,720
you can see that the subtitles are listed by language.

86
00:05:44,110 --> 00:05:48,490
And if we pick out one which is in English and we take a look at it,

87
00:05:49,960 --> 00:05:51,130
then you can see

88
00:05:51,190 --> 00:05:55,960
it's basically just all the words that are spoken in the movie or in the show

89
00:05:56,410 --> 00:05:59,780
and it's been transcribed into subtitles. Now

90
00:05:59,780 --> 00:06:04,010
then if we take all of these words that are from the most commonly watched

91
00:06:04,010 --> 00:06:07,550
movies and shows, we end up with these frequency lists.

92
00:06:08,000 --> 00:06:09,620
So if we take a look here,

93
00:06:09,620 --> 00:06:14,620
it shows the most frequent words from 1 to 5,000.

94
00:06:16,280 --> 00:06:17,330
And at the very beginning,

95
00:06:17,330 --> 00:06:22,330
it's your I, of, is, all of these things that are really common.

96
00:06:22,550 --> 00:06:25,610
And then as you scroll down, you get to some longer words.

97
00:06:26,090 --> 00:06:28,100
And if you scroll to the bottom,

98
00:06:28,340 --> 00:06:31,460
you can see you're getting some more and more niche words.

99
00:06:32,600 --> 00:06:37,600
These frequency lists are compiled by a user called Hermitd

100
00:06:38,750 --> 00:06:41,480
and Hermitdis a Hermit Dave

101
00:06:41,900 --> 00:06:46,900
And he has a GitHub repository where he's compiled all of the frequency words,

102
00:06:48,980 --> 00:06:52,190
and you can see the latest version from 2018.

103
00:06:52,910 --> 00:06:56,840
Now he's got all of the frequency words for many languages

104
00:06:56,930 --> 00:06:58,970
and it's listed by the language code,

105
00:06:59,330 --> 00:07:01,940
so French would be FR for example.

106
00:07:02,600 --> 00:07:07,600
And here you can see the top 50,000 most frequent lists or the full entire list.

107
00:07:09,950 --> 00:07:13,460
We're probably not gonna learn more than 1000,

108
00:07:13,700 --> 00:07:16,250
and I'm certainly not going to get to 50,000.

109
00:07:16,730 --> 00:07:21,620
But this data here lists all the words that he found in these subtitles and the

110
00:07:21,620 --> 00:07:23,390
frequency that they occurred.

111
00:07:23,930 --> 00:07:27,260
And once they've been sorted in order of frequency,

112
00:07:27,560 --> 00:07:29,000
this is what you end up with.

113
00:07:30,860 --> 00:07:33,710
So I've already studied some of the first 200 words.

114
00:07:34,220 --> 00:07:37,340
So if I take a hundred words from this frequency dictionary

115
00:07:37,820 --> 00:07:42,200
and I put it into a Google sheet, then I end up with something like this.

116
00:07:42,890 --> 00:07:43,130
Now,

117
00:07:43,130 --> 00:07:47,180
what I want to be able to do is to create a flashcard where the front of the

118
00:07:47,180 --> 00:07:49,670
flashcard is the word in French,

119
00:07:50,000 --> 00:07:54,170
and then on the back of the flashcard is the answer in English for what

120
00:07:54,170 --> 00:07:58,460
that word means. Instead of having to flip through a dictionary

121
00:07:58,460 --> 00:08:00,620
finding out the meaning of each of these words,

122
00:08:00,920 --> 00:08:03,920
there's actually a really neat trick in Google sheets that I want to show you.

123
00:08:04,610 --> 00:08:06,500
If you hit equals to

124
00:08:06,530 --> 00:08:10,820
start a new formula and you type in Google translate,

125
00:08:11,630 --> 00:08:13,760
you can see it expects some inputs.

126
00:08:14,180 --> 00:08:16,880
First is the piece of text that you want to translate,

127
00:08:16,880 --> 00:08:20,810
so I'm going to click on this cell, and then it's the source language.

128
00:08:20,810 --> 00:08:24,380
So this is the language as a code. So for example,

129
00:08:24,530 --> 00:08:29,530
Spanish is ES and French is FR. And then the final input.

130
00:08:30,590 --> 00:08:34,340
it expects is the language code that you want to translate it to.

131
00:08:34,730 --> 00:08:38,900
So in this case, I want to translate it to English. So I'm going to use en.

132
00:08:39,559 --> 00:08:41,570
And then we can close off the parentheses, hit

133
00:08:41,570 --> 00:08:44,690
enter and after a little while with good internet,

134
00:08:44,930 --> 00:08:48,500
you'll see the English translation for this word. And of course,

135
00:08:48,500 --> 00:08:49,520
because we're an Excel,

136
00:08:49,520 --> 00:08:54,520
we can simply just drag this cross all the way down to all of our words.

137
00:08:54,900 --> 00:08:56,550
And after a little while, bam!

138
00:08:57,240 --> 00:09:00,660
It's translated all of those words into English.

139
00:09:01,680 --> 00:09:06,540
So this is a really neat trick and I'll link to the docs for this particular

140
00:09:06,540 --> 00:09:07,373
formula.

141
00:09:07,740 --> 00:09:12,740
And also you can take a look at the language support that Google's translation

142
00:09:12,990 --> 00:09:17,400
service has and you can see the language code for each of these languages.

143
00:09:17,550 --> 00:09:21,450
So if you want to try learning Macedonian or Malay,

144
00:09:21,720 --> 00:09:23,700
and this is going to be your best bet.

145
00:09:24,390 --> 00:09:27,450
So now that I've created my Excel sheet

146
00:09:27,480 --> 00:09:29,940
essentially of a French and English words,

147
00:09:30,330 --> 00:09:35,330
I've got potentially a hundred flashcards with the front and back data already

148
00:09:35,790 --> 00:09:38,700
saved inside this Google sheet. Now,

149
00:09:38,700 --> 00:09:42,570
all I have to do is simply download it as a CSV

150
00:09:42,930 --> 00:09:46,110
and we'll be able to work with it very easily. Now,

151
00:09:46,140 --> 00:09:50,430
you don't have to worry about downloading this or getting hold of this because

152
00:09:50,430 --> 00:09:55,430
I've already included the final CSV data in the starting project for you to be

153
00:09:55,800 --> 00:09:56,633
able to use.

154
00:09:57,270 --> 00:10:02,270
So head over to the next lesson and get started building your very own study

155
00:10:02,730 --> 00:10:05,460
aid, the flashy flashcard app.

