Exam 3 Topics

Image classification, NLP, time series, ethics How would you use this side Computer vision & everything past

Graphs

Protein Folding Problem

Molecular structure Look at a protein and figure out how it interacts with other proteins

Social Networks

Giant social graph

Computer Networks

Network graphs Find unusual traffic Where to shorten things

Graph Problems

  • Propose new connections
  • Find shortest route
  • Organize compute graphs
  • Analyze graph health
  • Predict new links
  • Locate communities
    • Get a famous person/the most popular person in that community to promote it in the community
  • Find bots

Graph Level Tasks

  • Describe properties of a graph
  • Predict how graph will behave
  • Find clusters/structure within the graph
  • Describe change over time In a graph-level task, our goal is to predict the property of an entire graph - for example, for a molecule represented as a graph, we might want to predict what the molecule smells like

Node Level Tasks

  • Classify nodes based on community A classic example of a node-level prediction problem is Zach’s karate club - single social network graph made up of individuals who have sworn allegiance to one of two karate instructors, a feud between Mr. Hi and John H creates a schism in the karate club, prediction problem is to classify whether a given member becomes loyal to either Mr. Hi or John H after the feud

Edge Level Tasks

  • Predict when an edge will form
  • Identify anomalous edges

Graph Terminology

In degree, out degree Temporal

  • Edges can be added over time
  • Nodes and edges can be added over time
  • Nodes and edges can be added and removed over time Multigraphs, hypergraphs

Featurizing the graph

Using the matrix itself doesn’t work because there are tons of adjacency matrices for a certain graph

Graphlet-based featurization

  • look at little structures, count up how many you see
  • kinda like one-hot encoding or histogram
  • problem: loses a lot of structure and important informatio

Simple features - graph

  • create features that are much simpleier
  • Persistence: the smallest number of links whose removal increases the diameter or disconnect the graph
  • Diameter: longest shortest path
    • longest path I can make between two nodes in the graph that is still the shortest path between them
    • The shortest path between the two furthest points on the graph (e.g. on campus)
    • e.g. the two people who are the furthest possibly apart - six degrees of separation
    • Will probably go through the most connected person in the world in the middle!
  • Average in degree/out degree
  • Nodes/edges added or deleted per time T
  • Distribution of degree over the network

Node-level features

  • In degree/out degree
  • Betweenness: number of shortest paths through that node
    • High betweenness: people who interact with multiple disparate communities
  • Degree centrality: degree
  • Closeness centrality: average distance from every node, from this node
    • How near the center they are
  • Eigenvector centrality

We would like to predict when a link forms in a graph

  • Recommend friends on Facebook
  • Figure out who will be new terrorists
  • Predict where money/resources will flow Measure how unexpected a link is
  • Fraudulent transactions
  • Suspicious network traffic Turn this into a machine learning problem

Local Similarity Based Measures

  • Common Neighbors: How many neighbors the nodes have in common
  • Jaccard Coefficient: $$ J(A,B)=\frac{|A \cap B|}{|A \cup B|}
- Adamic/Adar - Number of Length 3 paths between them MOST LINKS DON'T FORM! ### Node Embeddings Turn nodes into vectors What should it mean for two nodes to be closer together? If they're in the same social neighborhood, they should be in the same shared vector embedding space neighborhood -> NETWORK HOMOPHILY We want nodes that are in similar neighborhoods to be closer in the latent space. Nodes that fill similar roles should be closer together in the graph -> STRUCTURAL EQUIVALENCE e.g. we keep all the "popular kids" and all the "loners" together -> Random Walks A random walk is where you randomly walk around Do a big random walk, blank out one of the nodes, then train the neural network to predict it - learns to fill in valid paths in the graph - Once I do this $\forall$ nodes, what I actually have is a set of embeddings for every single node - Learn weights -> decent vector embedding #### Improvements Let's introduce two hyperparameters $p$ - return parameter, controls the likelihood of immediately returning to a node we just visited in a walk $q$ - the in-out parameter, controls how likely we are to stay in the neighborhood of node $u$, or are we more likely to visit others If we set up $p$ and $q$ in a BFS behavior, we'll end p with a structural-equivalent embedding DFS - going to learn local pathways ## Community Detection Algorithm Invented at UMD ### Network Topology - Understand where weak points in a network lie - Bridges: connecting two different communities are weak ties - An edge is a bridge if its removal results in disconnection of its terminal vertices. - Bridges - rare in real-life networks - Idea: relax the definition by checking if the distance between two terminal vertices increases if edge is removed - The larger the distance, the weaker the tie is - Community - things that have bridges between them ### Algorithm Remove as many bridges as possible, decompose the graph into a hierarchy of communities - Delete edges that will cause the shortest paths to increase - Keep doing that until I get disconnected components - First set: first of the hierarchy - Then look inside each community of disconnected components - Delete all of the bridges - most shortest paths - Keep deleting until we get disconnected components - Recursively rinse and repeat - Individual people -> base case Neighborhood overlap:

overlap(v_{i},v_{j})= \frac{|N_{i}\cap N_{j}|}{|N_{i}\cup N_{j}|-2}

## Graph Convolutional Networks