Graph Analytics for Big Data Coursera Quiz Answers

You will be able to represent an issue into a graph database and conduct analytical activities over the graph in a scalable manner once you have finished this course. Even better, you will be able to use these methods to assess the importance of your data sets for individual projects.

Join Now

Graph Analytics for Big Data Coursera Quiz Answers

Quiz 1 – Introduction to Graphs

Q1. Which of the following are graphs? (check all that apply)

Q2. Which of the following is the correct adjacency matrix for this graph?

  • Neither option is correct.

Q3. Which of the following content would be objects (or nodes) in a graph that represents the activity in a Facebook page?

  • Created_post (the action of creating a post)
  • post text
  • comment text
  • location
  • friends (the action of making someone your friend);

Q4. Based on the videos, which kinds of analysis might one be able to perform on a tweet graph?

  • find interacting groups of users
  • extract conversation threads
  • find influencers in a twitter community

Q5. The key reason mentioned in the video that biology applications need Big Data analytics is…

  • The integration of multiple data sources from different researchers and of different sources of information.
  • The complexity of interactions that correlate to inform phenotypes.
  • The new use of computational techniques to explore new areas of biology research more quickly than can be done with “live” or wetlab experiments.

Q6. Which of the Vs BEST describes the result in constant increasing in the number of edges in a graph, sometimes causing challenges in knowing when one has found “an answer” to one’s analysis question?

  • Variety
  • Volume
  • Velocity
  • Valence;

Q7. Which of the Vs results in increased algorithmic complexity (which can cause analyses to not be able to finish running in reasonable amounts of time)?

  • Valence
  • Velocity
  • Volume
  • Variety

Q8. Which of the Vs results in challenges due to graphs created from varying kinds, formats, sources, and meanings of data?

  • Variety
  • Valence
  • Volume
  • Velocity

Q9. Which of the Vs causes increased interconnectivity of a graph — which can cause problems in analysis due to density?

  • Velocity
  • Variety
  • Volume
  • Valence

10. Updating a graph with a stream of posting information on facebook is an example of which of the Vs?

  • Velocity
  • Volume
  • Variety
  • Valence

11. Studying Amarnath’s gmail interactions over time (as gmail started to be used by more and more people) is BEST defined as an impact of which of the Vs?

  • Valence
  • Velocity
  • Variety
  • Volume;

Quiz 2 – Graph Analytics Applications

Q1. A graph representing tweets would have only “one type” (e.g. label) of node.

  • True
  • False

Q2. In a network representing the world wide web nodes would likely represent:

  • Hyperlinks
  • Webpages
  • Google search terms
  • Individual computers

Q3. In a network representing the world wide web edges (or links) would likely represent

  • Hyperlinks
  • Webpages
  • Google search terms
  • Individual computers

Q4. In an email network, which might reasonably be represented by weight on edges?

  • average number of emails sent from one user to another in a week
  • the total number of emails sent by one user in a week
  • the total number of people who sent an email in a week

Q5. A loop in a graph is where:

  • when there is a edge from A->B, there is also an edge from B->A.
  • where there is a path in some way from a node, through 1 or more other – nodes, back to the original node.
  • where there is an edge from a node to itself.

Q6. An example of a loop in a graph could occur when:

  • Someone emails themself
  • Someone emails a friend who replies
  • Someone emails a friend, who emails another friend, who then replies to you

Q7. When trying to represent a relationship between Maria and Julio who have more than one relationship to each other (e.g., tennis partner, co-worker, emergency contact) which of the following would be needed in a graph representing those relationships

  • Separate graphs for each kind of relationship
  • Multiple nodes for each of Maria and Julio, to capture the various relationships
  • Multiple edges between Maria and Julio

Q8. In many applications paths (where we go from one node to another without repeating nodes) are more useful than walks (where we can repeat a node when going from one node to another).

  • True
  • False

Q9. Trails (paths without repeated edges) can be interesting in which of the following problem applications?;

  • Routing to avoid visiting the same city.
  • An email network tracing frequency of emails from one person to another.
  • An email network tracing email replies.
  • Routing to avoid using the same bridge or road.

Q10. Suppose we have an email network where the edges of a graph represent the number of emails from one user to another.

If I was going to ask if Maria had sent any emails that (either directly or through forwarding from others) reached Julio, I would ask if:

  • Julio’s node was reachable from Maria node
  • Maria’s node was reachable from Julio’s node

Q11. If I want to find the diameter of a graph, I should start by finding the shortest path between each set of nodes.

  • True
  • False

Q12. What is the diameter of this graph?

  • 1
  • 2
  • 3

Q13. This question is about “best paths”. To find the most discussed email in an email network, would we be looking to minimize a function or maximize a function?

  • Maximize
  • Minimize

Q14. Which are the two kinds of constraints on paths discussed in the video on basic path analytics? (check 2) Hint: remember the example of Amarnath needing to get to work by taking his son to school.

  • Directionality
  • Inclusion of nodes and/or edges
  • Exclusion of nodes and/or edges

Q15. What are examples of preference constraints in the Google Maps application?

  • Avoid roads under construction
  • Avoid highways
  • Include son’s school

Q16. Which of the statements below is true?

  • Dijsktra’s algorithm is computationally inefficient (has high computational complexity).
  • Dijsktra’s algorithm is computationally efficient (has low computational complexity).

Q17. In the video on “Inclusion and Exclusion Constraints” we learn that adding constraints can actually make our analysis job easier. For example, when we require that a given node be included on a path, which of the following impacts now make the analysis job easier? (Choose 2)

  • Changing the weights on the edges of the graph and/or subgraphs
  • Splitting the task into 2 independent shortest path problems
  • Reduction of the size of the graph

Quiz 3 – Connectivity, Community, and Centrality Analytics

Q1. The example given in the lectures of when a power network loses power in large portions of its service area was an example of what?

  • a problem that can occur when centrality is too high
  • an attack which causes disconnection of the graph
  • high levels of connectivity which make it easy to bring a network down

Q2. Is the following graph strongly connected, weakly connected or neither?

<image: svg%3E>

  • neither;
  • strongly connected
  • weakly connected

Q3. Is the following graph strongly connected, weakly connected or neither?

<image: svg%3E>

  • neither
  • strongly connected;
  • weakly connected

Q4. If you were going to look for a node which would be most likely to be the target of an attack to disconnect a network, what would be the best characteristic to look for?

  • high degree nodes
  • low degree nodes
  • nodes that, if they were removed, would cause the graph to go from strongly connected to weakly connected

Q5. What is the out-degree of node B?

<image: svg%3E>

  • 0;
  • 1
  • 2
  • 3

Q6. In the graph below, which node is the greatest listener?

<image: svg%3E>

  • A
  • B;
  • C
  • D

Q7. In the graph below, which nodes are the greatest communicators? (Hint: there’s a tie)

<image: svg%3E>

  • A
  • B
  • C
  • D;

Q8. What would we be looking for if we followed the steps below? Note: we have 2 graphs.

Create a table for each graph where, for each node, you list the degree of the node. For each graph, create a histogram indicating how many nodes in that graph have a specific degree (e.g., how many nodes have degree 1? 2? etc.). Use advanced approaches (e.g. Euclidean distances) to compare these two histograms.

  • Connectivity
  • Centrality
  • Similarity
  • Community

Q9. Which of the following are the three types of analytics questions asked about communities?

  • Static
  • Evolution
  • Prediction
  • Connection

Q10. What type of community analytics question is the following?

Did a community form on Twitter around the 2014 World Cup in Brazil?

  • Static
  • Prediction
  • Connection
  • Evolution

Q11. Which type of community analytics question is the following?

How tightly knit was the 2014 World Cup twitter community on July 13, 2014 (the day of the finals)?

  • Static
  • Evolution
  • Prediction
  • Connection

12. What is the external degree of the node indicated in the graph below?

<image: svg%3E>

  • 1
  • 2
  • 3
  • 4

Q13. Which of the two graphs below is more modular?

<image: svg%3E>

  • A
  • B

Q14. Which of the following community tracking phases usually occurs when a company spins off a start-up?

  • Split
  • Birth
  • Death
  • Grow
  • Merge
  • Contract

Q15. An influencer in a network is defined as:

  • a node which can reach all other nodes quickly
  • the biggest gossip in the network
  • a node which has heavy weight edges to at least 1/2 of the nodes in the network

Q16. Which of the following are the 2 core “key player” problems that centrality analytics can address?

  • A set of nodes which can reach (almost) all other nodes
  • What is the shortest path through a network
  • Which nodes’ removal will maximally disrupt the network
  • Which nodes have the highest ratio of out-degree nodes to in-degree nodes

Q17. What kind of centrality would you want to analyze in a graph if you wanted to inject information that flows through the shortest path in a network and have it spread quickly?

  • Degree
  • Group
  • Closeness
  • Between-ness

Q18. What kind of centrality would you want to analyze in a graph if you wanted maximize commodity flow in a network?

  • Group
  • Degree
  • Closeness
  • Between-ness

Q19. What kind of centrality identifies “hubness”?

  • Between-ness
  • Closeness
  • Degree
  • Group

Quiz 4 – Graph Analytics with Neo4j

Q1. Which of the following is a Cypher command used to combine two or more query results?

  • union
  • combine
  • merge
  • return

Q2. For a graph network whose nodes are all of type “MyNode”, which has both incoming and outgoing edges, and which has both root and leaf nodes, what will the following Cypher code return in a Neo4j report?

match (n:MyNode)<-[r]-() return n

  • All nodes and edges except leaf nodes and their edges.
  • The entire network, all nodes and edges
  • All nodes except root nodes.
  • Edges but no nodes.

Q3. The Cypher query language shares some commands in common with SQL.

  • True
  • False

Q4. The following query will return a graph containing whatever loops might exist.

match (n)-[r]-(n) return n, r

  • True
  • False

Q5. Which Cypher pattern is used to represent a node?

  • ()
  • []
  • {}
  • <>

Q6. QNeo4j is a …

  • Graph database
  • Relational database
  • None of the above

Q7. Which Cypher command launches a Neo4j database search?

  • MATCH
  • RETURN
  • CREATE
  • None of the above

Q8.Cypher does not include a specific command to find the shortest path in a graph network.

  • False
  • True

Q9. Cypher includes a ‘diameter’ command to find the longest path in a graph network.

  • False
  • True

Quiz 5 – Assessment Questions on ‘Practicing Graph Analytics in Neo4j With Cypher’

Q1. What is the number of nodes returned?

  • 50,000
  • 9656
  • 9756
  • 8673

Q2. What’s the number of edges?

  • 50,000
  • 49,834
  • 46,621
  • None of the above

Q3. The number of loops in the graph is:

  • 1035
  • 1395
  • 1221
  • 1243

Q4. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) gives us

  • the count of all non loop edges between every adjacent node pair.
  • the count of all edges between every adjacent node pair.
  • the count of all edges.
  • None of the above

Q5. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) as myCount order by myCount desc limit 1 produces what?

  • a random edge
  • the node with the maximum number of looping edges
  • two neighboring nodes, each with a high outdegree
  • the pair of nodes with the maximum number of multi-edges between them

Q6. The query match p=(n {Name:’BRCA1′})-[:AssociationType*..2]->(m) return p produces what?

  • The neighbors of the node whose name is ‘BRCA1’
  • The 2-neighborhood of the node whose name is ‘BRCA1’
  • The neighbors’ neighbors of the node whose name is ‘BRCA1’
  • The neighbors whose distance is greater than 1 and less than 2 of the node whose name is ‘BRCA1’

Q7. How many non-directed shortest paths are there between the node named ‘BRCA1’ and the node named ‘NBR1’?

  • 8
  • 9
  • 10
  • None of the above

Q8. The top 2 nodes with the highest outdegree are:

  • GRB2 and TP53
  • EP300 and BRCA1
  • MEPCE and EGFR
  • SNCA and BRCA1

Q9. Applying the example queries provided to you, create the degree histogram for the network. How many nodes in the graph have a degree of 3?

  • 1351
  • 821
  • 675
  • 512

Quiz 6 – Using GraphX

Q1. In this code snippet below from the Hands On exercise on importing data, ‘100L + row…’ adds 100 to the value of every country ID. Which of the following statements are true regarding this decision? (Note: you may select more than one)

val countries: RDD[(VertexId, PlaceNode)] =
sc.textFile(“./EOADATA/country.csv”).
filter(! _.startsWith(“#”)).
map {line =>
val row = line split ‘,’
(100L + row(0).toInt, Country(row(1)))
}

  • Another option would have been to add 100 to the metropolis keys as they were imported, and leave the country keys as they were originally numbered.
  • This step was needed to create unique keys between the country and the metropolis datasets.
  • Another option would be to add 500 to the country keys.

Q2. In the metro example, what is an in-degree in relation to a country? Hint: this was covered in the Building a Degree Histogram Hands-On exercise.

  • A street in a city.
  • Another city.
  • A continent.
  • A metro area or metropolis.

Q3. In the Hands-On exercise on network connectedness and clustering, Antarctica was easy to identify. Why?

  • It had many edges
  • It had a vertex ID of 205.
  • It is the green dot that that has no connections, or it is the least connected cluster.

Q4. In the Facebook graph example, the visualization looked like broccoli. Why?

  • In a directed graph, the stalks are large.
  • Social networks have communities or pockets of people who interact densely.
  • The high centrality of some people nodes in facebook gives the graph its broccoli shape.

.

Review:

Based on our knowledge, we urge you to enroll in this course so you can pick up new skills from specialists. It will be worthwhile, we trust.

 

Leave a Comment