# Data Processing

Data processing and visualization

Create the following functions, and any other auxiliary function you consider necessary, using when needed the libraries Matplotlib and Numpy:
• pairwise_distance(vec1, vec2): compute the Euclidean distance between two 1 dimensional Numpy arrays v, u as d(v, u) = √ ∑ i (v i − u i ) 2 where v i indicates the entry of order i-th in the array.
• pairwise_distances(vec, mtx): compute a 1 dimensional Numpy array containing the distance of vec from each row of mtx which is a 2 dimensional numpy array of size n × 2 b
• mean(mtx): compute the mean of a 2 dimensional numpy array of size n × 2 b as a 1 dimensional numpy array of size 2 b .
• plot_hists(dist_vecs): plot the histograms of a list of 1 dimensional numpy arrays (one histogram per array) on the same figure.
• plot_comparison(mtxs): plot the comparison between each non redundant and distinct pair of stories; a single comparison between 2 stories s i , s j is the output of plot_hists of the following 3 arrays: array 1) the distance of each sentence in story s i from the average sentence in story s i , array 2) the distance of each sentence in story s j from the average sentence in story s j and array 3) the distance of each sentence in story s j from the average sentence in story s i .
• execute the plot_comparison function for stories vectorized with parameters k=2 and nbits=15 .
[20 marks]
Question 2: Analysis
Describe, using up to a maximum of 200 words, what information can be deduced from the plots obtained by plot_comparison . Discuss also what changes when the comparison is run on the same stories but vectorized with parameters k=1 and k=3 and nbits=15 .

Need help with this assignment or a similar one? Place your order and leave the rest to our experts!