Make dynamical (temporal) network by Gephi.

You can use Gephi to visualize dynamical networks. One thing I prefer Gephi for temporal networks is that you can tune the time window of interactions. What you need to do first is to make your data in GEXF format. Gephi can read GEXF format for dynamical networks. There is also a networkx module for writing a file to GEXF format.

So to make a file into GEXF format You need to install pygexf first. Make sure you have lxml as well. Let’s assume your data consists of source, target and time. Source and target are nodes with unique labels and time is time of interaction which is date in our example. Edge is the unique interaction between source and target with start and end time points. OK lets start make a GEXF format (in Python):

from gexf import Gexf

gexf = Gexf('Your Name','28-11-2012')
#make an undirected dynamical graph
graph = gexf.addGraph('undirected','dynamic','28-11-2012',timeformat='date')
#you add nodes with a unique id
#make edge with unique id, the edge has time duration from start to end
graph.addEdge(edge_id,Source,Target,start = Date , end = Date)
#write the gexf format to fileout

Since you made the data into right format, use latest version of Gephi and import the data. Gephi automatically notify you a timeline on the bottom of the graph. You can enable it and change the time window. The outcome in Gephi looks like this. The graph is showing interactions between banks within the month March.

Python tips for optimizing your code for big datasets

If you are like me non-programer but working with large datasets, here are tips in Python language. The tips are for those who work with large datasets and need to speed up or save the memory.

  • Benchmark: helps to split the code and test the speed and performance of each function.
  • Generators.
  • Python objects are flexible but not fast; solution: namedtuple or __slots__
  • Cache lookups are faster than normal lookups in Python.
  • Keep the python functions in high level.
  • Itertools are very handy. You can find them here

Which one is faster (or consume less memory):  (“>” means better )

  • d.iteritems() > d.items()
  • join > string
  • read_line() or read_for > readlines()
  • yield > append
  • for sorting a list use heap

If you have more tips don’t hesitate to share.