파이썬
IPython Parallel
#
Find similar titles
- (rev. 4)
- Sardor
Structured data
- Category
- Etc
- Programming
- Computer science
Table of Contents
Overview #
IPython's one of the main top features is that it allows to do parallel and distributed computing. IPython has it's own framework named ipyparallel
(formerly IPython.Parallel
). And, this framework supports bunch of parallel computing architectures and job schedulers which help us to increase speed of computing in applications.
Architecture #
The IPython architecture consists of four components:
- The IPython engine
- The IPython hub (part of IPython controller)
- The IPython schedulers (part of IPython controller)
- The controller client
IPython Engine #
The IPython engine is a Python instance that executes and takes Python commands over a network connection. IPython engine can also handle incoming and outgoing Python objects sent over a network connection. When multiple engines are started, parallel and distributed computing becomes possible. An important feature of an IPython engine is that it blocks while user code is being executed.
IPython Controller #
The IPython controller provides an interface for giving delivering tasks to the engines. Also, it is a collection of processes to which IPython engines and clients connect. It is composed of a Hub
and a collection of Schedulers
. These Schedulers
are typically run in separate processes but on the same machine as the Hub
, but can be run anywhere from local threads or on remote machines.
The Hub #
The Hub
is the center of an IPython cluster. This is the process that keeps track of engine connections, schedulers, clients, as well as all task requests and results. Main role of the Hub is to execute queries of the cluster state, and minimize the necessary information required to establish the many connections involved in connecting new clients and engines.
The Scheduler #
All actions that can be performed on the engine go through a Scheduler
. While the engines
themselves block when user code is run, the schedulers hide that from the user to provide a fully asynchronous interface to a set of engines.
IPython Client (and Views) #
There is one primary object, the Client
, for connecting to a cluster. For each execution model it creates appropriate View
. And, those views allow users to interact with the engines
through the interface. It has 2 default views:
- The
DirectView
class for explicit addressing - The
LoadBalancedView
class for destination-agnostic scheduling
Getting started #
To start using IPython
cluster we need to make sure that it is installed on our system, otherwise it can be installed via pip:
How to install #
Installation is simple as usual:
$ pip install ipython[parallel]
Or, explicitly:
$ pip install ipyparallel
Example #
To use IPython
for parallel computing, we start one instance of the controller and 4 instances of the engine. To do this, it is best to simply start a controller and engines on a single host using the ipcluster
command. To start a 1 controller
and 4 engines
on our localhost machine, we just simply do:
$ ipcluster start -n 4
Once we started the 1 controller
and 4 engines
, we can use these engines
to do something useful. To make sure everything is working correctly, we can check with following commands:
$ ipython
In [1]: from IPython.parallel import Client
In [2]: c = Client()
In [4]: c.ids
Out[4]: set([0, 1, 2, 3])
In [5]: c[:].apply_sync(lambda : "Hello, World")
Out[5]: [ 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World' ]
When a client created with no arguments, then client tries to find the corresponding JSON file in the local ~/.ipython/profile_default/security
directory. Or if we specify a profile, we can use that with the Client
.
This should cover most cases:
In [2]: c = Client(profile='myprofile')
If we put JSON file in a different location or it has a different name, we can create the client like this:
In [2]: c = Client('/path/to/my/ipcontroller-client.json')
Client
needs to be able to see the Hub
’s ports to connect. So if they are on a different machine, we may need to use an ssh server to tunnel access to that machine, then we can connect to it with:
In [2]: c = Client('/path/to/my/ipcontroller-client.json', sshserver='me@myhub.example.com')
References #
- http://ipython.org/ipython-doc/3/
- https://ipython.org/ipython-doc/3/parallel/
- https://en.wikipedia.org/wiki/IPython#Parallel_computing