Exam 3 Topics
Image classification, NLP, time series, ethics How would you use this side Computer vision & everything past
Graphs
Protein Folding Problem
Molecular structure Look at a protein and figure out how it interacts with other proteins
Social Networks
Giant social graph
Computer Networks
Network graphs Find unusual traffic Where to shorten things
Graph Problems
- Propose new connections
- Find shortest route
- Organize compute graphs
- Analyze graph health
- Predict new links
- Locate communities
- Get a famous person/the most popular person in that community to promote it in the community
- Find bots
Graph Level Tasks
- Describe properties of a graph
- Predict how graph will behave
- Find clusters/structure within the graph
- Describe change over time In a graph-level task, our goal is to predict the property of an entire graph - for example, for a molecule represented as a graph, we might want to predict what the molecule smells like
Node Level Tasks
- Classify nodes based on community A classic example of a node-level prediction problem is Zach’s karate club - single social network graph made up of individuals who have sworn allegiance to one of two karate instructors, a feud between Mr. Hi and John H creates a schism in the karate club, prediction problem is to classify whether a given member becomes loyal to either Mr. Hi or John H after the feud
Edge Level Tasks
- Predict when an edge will form
- Identify anomalous edges
Graph Terminology
In degree, out degree Temporal
- Edges can be added over time
- Nodes and edges can be added over time
- Nodes and edges can be added and removed over time Multigraphs, hypergraphs
Featurizing the graph
Using the matrix itself doesn’t work because there are tons of adjacency matrices for a certain graph
Graphlet-based featurization
- look at little structures, count up how many you see
- kinda like one-hot encoding or histogram
- problem: loses a lot of structure and important informatio
Simple features - graph
- create features that are much simpleier
- Persistence: the smallest number of links whose removal increases the diameter or disconnect the graph
- Diameter: longest shortest path
- longest path I can make between two nodes in the graph that is still the shortest path between them
- The shortest path between the two furthest points on the graph (e.g. on campus)
- e.g. the two people who are the furthest possibly apart - six degrees of separation
- Will probably go through the most connected person in the world in the middle!
- Average in degree/out degree
- Nodes/edges added or deleted per time T
- Distribution of degree over the network
Node-level features
- In degree/out degree
- Betweenness: number of shortest paths through that node
- High betweenness: people who interact with multiple disparate communities
- Degree centrality: degree
- Closeness centrality: average distance from every node, from this node
- How near the center they are
- Eigenvector centrality
Link prediction & Anomaly detection
We would like to predict when a link forms in a graph
- Recommend friends on Facebook
- Figure out who will be new terrorists
- Predict where money/resources will flow Measure how unexpected a link is
- Fraudulent transactions
- Suspicious network traffic Turn this into a machine learning problem
Local Similarity Based Measures
- Common Neighbors: How many neighbors the nodes have in common
- Jaccard Coefficient: $$ J(A,B)=\frac{|A \cap B|}{|A \cup B|}
overlap(v_{i},v_{j})= \frac{|N_{i}\cap N_{j}|}{|N_{i}\cup N_{j}|-2}
## Graph Convolutional Networks