Python tips for optimizing your code for big datasets

If you are like me non-programer but working with large datasets, here are tips in Python language. The tips are for those who work with large datasets and need to speed up or save the memory.

  • Benchmark: helps to split the code and test the speed and performance of each function.
  • Generators.
  • Python objects are flexible but not fast; solution: namedtuple or __slots__
  • Cache lookups are faster than normal lookups in Python.
  • Keep the python functions in high level.
  • Itertools are very handy. You can find them here http://docs.python.org/library/itertools.html

Which one is faster (or consume less memory):  (“>” means better )

  • d.iteritems() > d.items()
  • join > string
  • read_line() or read_for > readlines()
  • yield > append
  • for sorting a list use heap

If you have more tips don’t hesitate to share.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s