Download the Plain Text UTF-8 file “The Fairy Tales of Charles Perrault by Charles Perrault”
Question 1: Functional Programming
Create the following functions, and any other auxiliary function you consider necessary, using the Functional Programming style:
• read_lines_in_text(fname): read in memory a text file with filename fname and output an iterable over non empty text lines.
• define a collection of forbidden words that contain at least the following strings: ‘Illustration’,’*’, ‘#’,’_facing_’,’_page’ .
• define a function to filter lines when they contain one of the words in the
collection of forbidden words.
• define a function to determine if a line is the title of a story, where a line is a title if it starts and ends with the underscore character.
• split_text(start_definition_func, text_iterator): use the function
start_definition_func passed as the first argument to split a stream of lines in stories and output a list of stories where each story is a list of lines. Make sure that the first line is the title of the story.
• use function composition to create a function that reads a text file, filters lines and output a list of stories
• define a collection of forbidden titles that contain at least the following strings:’_The Moral_’, ‘_Another_’ .
• define a function to filter stories when their title is one of the words in the collection of forbidden titles.
• define a function to filter stories when their size, expressed as the number of lines, is not within two user defined values for min_num_lines and max_num_lines .
• use function composition to create a function that can filter stories if they have a forbidden title or they do not satisfy the size constraints
• transform_into_sentences(story): given a story, i.e. a list of strings, convert the text to lowercase and transform the story in a list of sentences, where a sentence is a string delimited by the full stop symbol . .
• use the pipeline function to create a list of stories from the fairy tale text, where the lines with the forbidden words are excluded, that do not have a title that is forbidden, that have between 10 and 200 lines and that are organized as a list of sentences.
Question 2: Numpy
Create the following functions, and any other auxiliary function you consider necessary, using the functionalities offered by the Numpy library:
• code(item, nbits=10): convert a tuple of strings into an integer in the range [0, 2 b ] where b is the number of bits. You may use the built in function hash(object) .
• vectorize_line(line, k=3, nbits=10): convert a string into a 1 dimensional numpy array of size 2 b . The conversion should be performed as follows: extract all possible sub sequences of k words from the string in input; convert each k- word tuple into an integer using the previous function code ; use the resulting integer p as the position index; count the number of occurrences of the resulting integer in the input line; this is the value in position p in the returned array.
• vectorize_story(story, k=3, nbits=10): convert a story , which is a list of n strings, into a 2 dimensional numpy array of size n × 2 b using vectorize_line .
• convert all stories in text in a list of 2 dimensional numpy arrays and assign it to the variable mtxs .
Do you need help with this assignment? Or a different one? We got you covered.