Inheritance, Factories, and resorting to deepcopy

14 June 2008 10:50

I was recently working on a script that required me to do multiple simulations of the implementation of a set of projects, each of which consisted of the same set of tasks, but the time taken by some of the tasks was stochastic. It was about as tricky as it sounds.

I initially thought I might have a simple Project and a Task class, something like this:

class Task:
    def __init__(self, time):
        self.time = time

class Project:
    def __init__(self, tasks):
        self.tasks = tasks

t1 = Task(3)
t2 = Task(7)

tasks = [t1,t2]

projects = [Project(tasks) for i in xrange(40)]

But since I needed the time of the tasks to be stochastic, and different for each project, this doesn't work as the same two tasks would be shared between all the projects. I thought of a few solutions. Firstly, I could have instantiated the tasks in a loop:

from math import log
from random import random

def expRV(lambda):
    return -log(random())*lambda

class Task:
    def __init__(self, averageTime):
        self.duration = expRV(averageTime)

class Project:
    def __init__(self, tasks):
        self.tasks = tasks

projects = []

for i in xrange(40):
    t1 = Task(3)
    t2 = Task(7)

    tasks = [t1, t2]

    projects.append(Project(tasks))

Which would work, but the whole project generation code would need to be a loop too. Or the time for the tasks needs to be resettable, like this:

class Task:
    def __init__(self, averageTime):
        self. averageTime = averageTime

    def reset(self):
        self.duration = expRV(self.averageTime)

Either way, having to define the required tasks inside a loop like this looks a bit messy when there are a significant number of tasks.

Another route was to pass a list of classes to the projects, so they could instantiate each class for each run:

class Task:
    def __init__(self):
        self.duration = expRV(self.averageTime)

class Project:
    def __init__(self, tasks):
        self.taskClasses = tasks

    def setup(self):
        self.tasks = [t() for t in self.taskClasses]

class T1(Task):
    averageTime = 3

class T2(Task):
    averageTime = 7

tasks = [T1, T2]

projects = [Project(tasks) for i in xrange(40)]

This is relatively tidy, but is quite verbose if there are a lot of different tasks. One more option was to use factories:

class Task:
    def __init__(self, duration):
        self.duration = duration

class TaskFactory:
    def __init__(self, averageTime):
        self.averageTime = averageTime

    def __call__(self):
        return Task(expRV(averageTime))

class Project:
    def __init__(self, tasks):
        self.taskFactories = tasks

    def setup(self):
        self.tasks = [t() for t in self.tasksFactories]

t1 = TaskFactory(3)
t2 = TaskFactory(7)

tasks = [t1, t2]

projects = [Project(tasks) for i in xrange(40)]

This scheme looked quite good, until I wanted to have dependencies between the tasks. Then I realised there was no simple way to mirror dependencies between the factories to the tasks themselves. This would also be a problem with the class-based method discussed previously. So finally I went back to the first working idea, instantiating the tasks within a loop. But instead of actually doing that, I instantiated them and then created a deepcopy for each project:

from copy import deepcopy

class Task:
    def __init__(self, averageTime):
        self. averageTime = averageTime

    def reset(self):
        self.duration = expRV(self.averageTime)

class Project:
    def __init__(self, tasks):
        self.tasks = tasks

t1 = Task(3)
t2 = Task(7, depends = [t1])

tasks = [t1, t2]

projects = [Project(deepcopy(tasks)) for i in xrange(20)]

I wasn't too keen to use deepcopy, but for this task it worked perfectly.

Looking back, it is surprising how many options there were, and I think there is a situation where each of them would be optimal.

Leave a comment