Part of the vision of TensorFlow to make machine learning and deep learning easier to
learn and easier to use, is the concept of having
built-in data sets. You seen the little bit
of a preview of the way back in the first course, when the fashion MNEs was
available to you without you needing to download and split the data into
training a test sets. Expanding on this,
there's a library called TensorFlow Data Services
or TFTS for short, and that contains many data sets and lots of different categories. Here's some examples;
and while we can see that there are many different data sets
for different types, particularly image-based,
there's also a few for text, and we'll be using
the IMDB reviews dataset next. This dataset is ideal because it contains
a large body of texts, 50,000 movie reviews which are categorized as
positive or negative. It was authored by
Andrew Mass et al at Stanford, and you can learn more
about it at this link.