If you are like me non-programer but working with large datasets, here are tips in Python language. The tips are for those who work with large datasets and need to speed up or save the memory.
- Benchmark: helps to split the code and test the speed and performance of each function.
- Generators.
- Python objects are flexible but not fast; solution: namedtuple or __slots__
- Cache lookups are faster than normal lookups in Python.
- Keep the python functions in high level.
- Itertools are very handy. You can find them here http://docs.python.org/library/itertools.html
Which one is faster (or consume less memory): (“>” means better )
- d.iteritems() > d.items()
- join > string
- read_line() or read_for > readlines()
- yield > append
- for sorting a list use heap
If you have more tips don’t hesitate to share.