Measure Twice

Relaying

My latest Twisted adventure began with a comment I came across in relay.py:

1
2
3
4
5
class RelayerMixin:

    # XXX - This is -totally- bogus
    # It opens about a -hundred- -billion- files
    # and -leaves- them open!

This seemed like a worthy problem to investigate so that, at the very least, I could write a ticket to track the issue.

The first challenge was to set up a smart host configuration with Twisted. A smart host is a mail server which accepts mail to any address and then determines the mail exchange for the address and connects to it to relay the mail. Unlike an open relay, a smart host imposes restrictions on the source of messages. While some may accept mail only from authenticated senders, Twisted’s default is to relay any mail received over a Unix socket or from localhost.

It was easy enough to run a smart host on my development machine. I just had to invoke twistd mail with the relay option and specify a directory to hold messages to be relayed:

1
twistd -n mail --relay=/tmp/mail_queue

The smart host uses DNS to look up mail exchanges and contacts them via SMTP on port 25. Because my ISP does not allow outgoing traffic on port 25 and because I did not want to relay test messages to real mail servers, I needed to make some changes to the Twisted source so that the email messages would be relayed to a Twisted mail server that I ran on a second computer. I modified relaymanager.py to relay to port 8025 and to use a hosts file for DNS resolution.

1
2
3
4
5
6
7
8
9
10
11
12
class SmartHostSMTPRelayingManager:
    ...
    # PORT = 25
    PORT = 8025
    ...
    def _checkStateMX(self):
        ...
        if self.mxcalc is None:
            # self.mxcalc = MXCalculator()
            from twisted.names.client import createResolver
            resolver = createResolver(None, None, b"/tmp/hosts")
            self.mxcalc = MXCalculator(resolver)

The hosts file maps example.com and example.net to the IP address of the computer running the target mail server.

1
2
10.224.77.149 example.com
10.224.77.149 example.net

I configured that server to run on the default port, 8025, and accept mail for a few users on the domains example.com and example.net:

1
2
twistd -n mail -d example.com=/tmp/example.com -u jim=pwd -u nat=pwd
-d example.net=/tmp/example.net -u joe=pwd -u bob=pwd

When I used telnet on the development machine to send mail to the smart host running on the same machine and addressed it to one of the configured users on example.com or example.net, the smart host relayed it to the mail server on the second machine.

Now that I had a usable configuration, I wanted to explore the implications of the comment that RelayerMixin opened a large number of files and never closed them. RelayerMixin is used to introduce a set of functions for relaying mail to another class, a relayer, through inheritance. On initialization, the relayer calls one of the RelayerMixin functions, loadMessages, with a list of the pathnames of messages which it is responsible for relaying. loadMessages opens each message file and stores the file object in a list. I hypothesized that if I sent a lot of messages to the smart host at once, its relayers would open files for all the messages and hit the operating system limit for open files.

I wrote a short program to send the SMTP commands for a series of messages to the smart host running on port 8025 of the same machine. The messages are randomly destined to one of two addresses on each of the two domains served by the mail server on the other machine.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from twisted.internet import protocol, reactor
from twisted.test.proto_helpers import LineSendingProtocol
from twisted.internet.defer import Deferred
from random import randint

NUM_MESSAGES = 250

addresses = ['joe@example.net', 'bob@example.net',
             'jim@example.com', 'nat@example.com']
num_addrs = len(addresses) - 1

msgs = ['helo']

for i in range(0, NUM_MESSAGES):
    origin = 'foo@example.com'
    destination = addresses[randint(0, num_addrs)]
    msgs.append('mail from: <{}>'.format(origin))
    msgs.append('rcpt to: <{}>'.format(destination))
    msgs.append('data'),
    msgs.append('from {} to {}'.format(origin, destination)),
    msgs.append('hi {}'.format(destination)),
    msgs.append('.'),

msgs.append('quit')
client = LineSendingProtocol(msgs)

done = Deferred()
f = protocol.ClientFactory()
f.protocol = lambda: client
f.clientConnectionLost = lambda *args: done.callback(None)

def finished(reason):
    reactor.stop()

done.addCallback(finished)

reactor.connectTCP('127.0.0.1', 8025, f)
reactor.run()

As I increased the number of messages sent, I expected to eventually see an exception occur when too many files were opened but that did not occur no matter how many messages were sent. From the server log, I observed that instead of opening one connection to the mail server for each domain and sending all the queued messages for that domain, the smart host was repeatedly connecting to the mail server and sending no more than a few messages at a time. That explained why the limit on open files was not being reached. The relayers were being handed only a few messages at a time so there was no need to open a lot of files at once.

This strategy for allocating work to relayers did not seem very efficient so I started exploring further. SmartHostSMTPRelayingManager, which implements the smart host functionality, has a function, checkState, which is called periodically to see if there are messages waiting to be relayed and if there is capacity to create new relayers. If so, it calls _checkStateMX to create relayers and allocate messages to them. It turns out that _checkStateMX contains a subtle bug which is the cause of the allocation behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def _checkStateMX(self):
    nextMessages = self.queue.getWaiting()
    nextMessages.reverse()

    exchanges = {}
    for msg in nextMessages:
        from_, to = self.queue.getEnvelope(msg)
        name, addr = rfc822.parseaddr(to)
        parts = addr.split('@', 1)
        if len(parts) != 2:
            log.err("Illegal message destination: " + to)
            continue
        domain = parts[1]

        self.queue.setRelaying(msg)
        exchanges.setdefault(domain, []).append(self.queue.getPath(msg))
        if len(exchanges) >= (self.maxConnections - len(self.managed)):
            break

_checkStateMX asks the relay queue for a list of waiting messages. Then it loops through the messages, grouping them by target domain. Eventually, each group will be handed off to a relayer. The problem is that _checkStateMX breaks out of the loop as soon as it has at least one message for the maximum number of domains it can concurrently contact. That value, maxConnections, is an optional parameter to SmartHostSMTPRelayingManager.__init__. Its default value is 2.

As _checkStateMX loops through the waiting messages, it creates a list of messages for the first domain it sees and keeps adding messages for that domain to the list. When it sees a second domain, it creates another list for that domain but since it has hit the limit on connections, it breaks out of the loop. So, any other messages in the queue for either domain must wait to be sent even though they could be handled by the same relayers. Instead of breaking out of the loop when it reaches the connection limit, _checkStateMX should continue to add messages to the lists for the domains it has already seen and ignore messages for other domains.

With the understanding of how messages are allocated to relayers, I was now easily able to trigger an exception for too many open files by sending a large number of messages to one domain instead of splitting them between two.

As a result of this exploration, I filed and submitted fixes for two issue tickets, a defect ticket for the handling of open files by RelayerMixin, and an enhancement ticket to improve how messages are allocated to relayers.

Unit Tests

One of the first Twisted mail issue tickets I tackled involved what seemed to be the simple matter of adding missing unit tests but led to some refactoring of Twisted code. The ticket highlights the missing unit tests for the startMessage and exists functions of both the AbstractMaildirDomain and DomainQueuer classes.

AbstractMaildirDomain is a base class which is meant to be subclassed to create mail domains where the emails are stored in the Maildir format. Most of the functions it provides are placeholders that need to be overridden in subclasses. However, it does provide implementations for two functions, exists and startMessage. exists checks whether a user exists in the domain or an alias of it and, if so, returns a callable which returns a MaildirMessage to store a message for the user. Otherwise, it raises an SMTPBadRcpt exception. startMessage returns a MaildirMessage which stores a message for the user.

The existing unit tests for AbstractMaildirDomain were minimal, checking just that it fully implemented the IAliasableDomain interface. Additional test cases were needed to verify that:

  • startMessage returns a MaildirMessage
  • for a valid user, exists returns a callable which returns a MaildirMessage and that the callable returns distinct messages when called multiple times
  • for an invalid user, exists raises SMTPBadRcpt

AbstractMaildirDomain could not be used directly in testing exists and startMessage because both call a function which has only a placeholder in the base class. So for testing purposes, I created a subclass of AbstractMaildirDomain, TestMaildirDomain, which overrides the placeholder functions.

Since each test would need a TestMaildirDomain to exercise, I wrote a setUp function which runs prior to the test and creates a TestMaildirDomain as well as a temporary Maildir directory for it to use. I also added a tearDown function which runs after each test to remove the temporary directory.

1
2
3
4
5
6
7
8
9
10
class AbstractMaildirDomainTestCase(unittest.TestCase):

    def setUp(self):
        self.root = self.mktemp()
        self.service = mail.mail.MailService()
        self.domain = TestMaildirDomain(self.service, self.root)
        self.domain.addUser('user',"")

    def tearDown(self):
        shutil.rmtree(self.root)

The tests for startMessage and exists turned out to be pretty straightforward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def test_startMessage(self):
    msg = self.domain.startMessage('user@')
    self.assertTrue(isinstance(msg, maildir.MaildirMessage))

def test_exists(self):
    f = self.domain.exists(mail.smtp.User("user@", None, None, None))
    self.assertTrue(callable(f))
    msg1 = f()
    self.assertTrue(isinstance(msg1, mail.maildir.MaildirMessage))
    msg2 = f()
    self.assertTrue(msg1 != msg2)
    self.assertTrue(msg1.finalName != msg2.finalName)

def test_doesntexist(self):
    self.assertRaises(mail.smtp.SMTPBadRcpt, self.domain.exists,
            mail.smtp.User("nonexistentuser@", None, None, None))

Like AbstractMaildirDomain, DomainQueuer acts as a domain, but instead of saving emails for users, it puts messages in a queue for relaying. Its exists and startMessage functions are meant to be used in the same way as those of AbstractMaildirDomain. It seemed like it would be an easy matter to adapt the unit tests for AbstractMaildirDomain to work for DomainQueuer.

It turned out, however, that the test for exists on DomainQueuer failed because a function was being called on a variable set to None. Something clearly hadn’t been properly configured for the test.

Here’s the DomainQueuer code being exercised in the test:

1
2
3
4
5
6
7
8
9
10
11
class DomainQueuer:

    def exists(self, user):
        if self.willRelay(user.dest, user.protocol):
            ...
        raise smtp.SMTPBadRcpt(user)

    def willRelay(self, address, protocol):
        peer = protocol.transport.getPeer()
        return (self.authed or isinstance(peer, UNIXAddress) or
            peer.host == '127.0.0.1')

exists calls willRelay with the protocol which was passed into it as part of the user argument. willRelay tries to call getPeer on the transport instance variable of the protocol. But, the minimal User object passed to exists, which worked for the AbstractMaildirDomain test, is not sufficient for the DomainQueuer test. willRelay needs the protocol to determine whether it should relay the message.

I had two thoughts about how to get around this problem, but I wasn’t sure either of them was satisfactory. One option would be to provide the User object with a fake protocol and fake transport so that willRelay could be configured to return the desired value. That seemed to be a very roundabout way to solve the problem of getting willRelay to return a specific boolean value.

A more direct way to get willRelay to return the desired value would be to monkeypatch it. That is, as part of the unit test, to replace the DomainQueuer.willRelay function with one that returns the desired value. The problem with monkeypatching is that, even though willRelay is part of the public interface of DomainQueuer, it could change in the future. For example, it could received additional optional arguments. Then, the unit test which used the monkeypatched version might not reflect the behavior of the real version of willRelay.

I got some great advice about how to approach this problem and unit testing in general on the Twisted developers IRC channel. The idea behind unit testing is to test each unit of code independently. Here we can consider exists and willRelay to be two different units. But the way willRelay is implemented means that it is difficult to test exists independent of it. I was advised that sometimes the best thing to do when adding unit tests is to refactor the code itself. So, I attempted to do that without changing the public interface of DomainQueuer.

I introduced a base class, AbstractRelayRules, whose purpose is to encapsulate the rules for determining whether a message should be relayed. Then, I defined a subclass, DomainQueuerRelayRules, which contains the default rules for the DomainQueuer class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class AbstractRelayRules(object):
    """
    A base class for relay rules which determine whether a message
    should be relayed.
    """

    def willRelay(self, address, protocol, authorized):
        """
        Determine whether a message should be relayed.
        """
        return False


class DomainQueuerRelayRules(object):
    """
    The default relay rules for a DomainQueuer.
    """

    def willRelay(self, address, protocol, authorized):
        """
        Determine whether a message should be relayed.

        Relay for all messages received over UNIX sockets or from 
        localhost or when the message originator has been authenticated.
        """
        peer = protocol.transport.getPeer()
        return (authorized or isinstance(peer, UNIXAddress) or
            peer.host == '127.0.0.1')

In DomainQueuer, I changed the __init__ function to take a new optional parameter specifying the rules to be used to determine whether to relay. When relayRules is not provided, the default DomainQueuerRelayRules is used. I also changed the willRelay function to ask the relayRules whether to relay instead of determining that itself. Existing code which creates a DomainQueuer without providing the extra argument works in the exact same way as the old code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class DomainQueuer:

    def __init__(self, service, authenticated=False, relayRules=None):
        ...
        self.relayRules = relayRules
        if not self.relayRules:
            self.relayRules = DomainQueuerRelayRules()


    def willRelay(self, address, protocol):
        """
        Determine whether a message should be relayed according
        to the relay rules.
        """
        return self.relayRules.willRelay(address, protocol, self.authed)

Finally, I created another subclass of AbstractRelayRules to be used for test purposes. TestRelayRules can be configured with a boolean value which determines whether relaying is allowed or not.

1
2
3
4
5
6
7
class TestRelayRules(mail.relay.AbstractRelayRules):

    def __init__(self, relay):
        self.relay = relay

    def willRelay(self, address, protocol, authorized):
        return self.relay

Now, the unit tests for DomainQueuer.exists using TestRelayRules are quite simple.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class DomainQueuerTestCase(unittest.TestCase):

    def test_exists(self):
        dq = mail.relay.DomainQueuer(self.service, False, TestRelayRules(True))
        f = dq.exists(mail.smtp.User("user@", None, None, None))
        self.assertTrue(callable(f))
        msg1 = f()
        self.assertTrue(isinstance(msg1, mail.mail.FileMessage))
        msg2 = f()
        self.assertTrue(msg1 != msg2)
        self.assertTrue(msg1.finalName != msg2.finalName)

    def test_doesntexist(self):
        dq = mail.relay.DomainQueuer(self.service, False, TestRelayRules(False))
        self.assertRaises(mail.smtp.SMTPBadRcpt, dq.exists,
                mail.smtp.User("user@", None, None, None))

Refactoring to decouple the rules for relaying messages from the DomainQueuer certainly made the unit testing code much cleaner. As a general matter, difficulty in writing unit tests may highlight dependencies in the code which should be refactored.

Types

Much of the work of documenting the Twisted Mail API has involved searching through the Python code to determine the types for parameters and return values. It often involves comparing functions in different classes which inherit from the same base class or implement the same interface. In some cases, I’ve resorted to looking at unit tests or example code to see how objects are used. After a recent experience while tracking down types, I’m more convinced than ever of the value of the API documentation.

I was documenting the alias module, which contains classes for redirecting mail from one user to another user, to a file, to a process, and to a group of aliases. Four different classes inherit from the base class AliasBase and implement the interface IAlias, which contains the function createMessageReceiver. The class hierarchy looks like this:

twisted.mail.alias.AliasBase
    twisted.mail.alias.AddressAlias
    twisted.mail.alias.AliasGroup
    twisted.mail.alias.FileAlias
    twisted.mail.alias.ProcessAlias

I was trying to determine the return value of IAlias.createMessageReceiver. The return value was clear for three of the four classes that implement IAlias because the object to be returned was created in the return statement.

FileAlias -> FileWrapper 
ProcessAlias -> MessageWrapper   
AliasGroup -> MultiWrapper 

The objects returned are all message receivers which implement the smtp.IMessage interface. They deliver a message to the appropriate place: a file, a process or a group of message receivers. It seemed pretty clear that the return value of the createMessageReceiver function in the IAlias interface should be smtp.IMessage. However, there was one more class that implemented the interface, AddressAlias, and the return value from that wasn’t so clear.

1
2
3
4
5
6
7
8
9
class AddressAlias(AliasBase):
    implements(IAlias)

    def __init__(self, alias, *args):
        AliasBase.__init__(self, *args)
        self.alias = smtp.Address(alias)

    def createMessageReceiver(self):
        return self.domain().exists(str(self.alias))

AddressAlias.createMessageReceiver returns the result of a call to exists on the result of a call to domain. domain is a base class function which returns an object which implements the IDomain interface. Fortunately, the IDomain interface was documented. It returns a callable which takes no arguments and returns an object implementing IMessage. Unfortunately, this return value didn’t match the pattern of the other three classes implementing IAlias.createMessageReceiver, all of which return an object implementing IMessage.

Although messy, it was possible that the return value of IAlias.createMessageReceiver was either an smtp.IMessage provider or a callable which takes no arguments and returns an smtp.IMessage provider. Or, it might have been a mistake.

At this point, I fortuitously happened to be looking at this code in an old branch and noticed a difference. There, the AddressAlias.createMessageReceiver function appeared as follows:

1
2
def createMessageReceiver(self):
    return self.domain().startMessage(str(self.alias))

After some investigation, I found a ticket that had been fixed earlier this year to remove calls to the deprecated IDomain.startMessage function. In the old code, startMessage also returns an IMessage provider. So, it seemed that a bug had been introduced in the switch from startMessage to exists.

The result of the call to exists must be invoked to get the proper message receiver to return. The code should read:

1
2
def createMessageReceiver(self):
    return self.domain().exists(User(self.alias), None, None, None)()

I filed a ticket in the issue tracking system and subsequently submitted a fix. While reworking the unit tests, I relied heavily on the API documentation I had written for the alias module. I think it’s safe to say that had the API been fully documented when the original change was made, this error would have been easy to spot during code review or to avoid in the first place.

API Documentation

My focus for the past few weeks has been on reviewing the API documentation for Twisted Mail and filling in the missing pieces. Why bother with documentation when there are bugs to fix and code to write? The quality of API documentation directly affects the experience that a developer has employing or further developing a library. It can be the difference between a piece of software being successfully integrated into a larger project or being discarded because the user just can’t figure out how it works.

The Twisted API documentation is automatically generated from comments embedded in the source code. The Twisted Coding Standards dictate that all methods, functions, classes, and modules have docstrings which describe their purpose and detail all parameters and attributes. The docstrings are written in Epytext Markup Language and are processed by pydoctor, an API documentation generator, to produce the API documentation. Epytext provides a way to specify formatting information, to annotate parameters, return values, and their types, and to create cross-reference links. Here’s an example of a docstring written in Epytext:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def exists(address, domain):
    """
    Checks whether a user exists in a domain.

    @param address: The user
    @type address: L{Address}

    @param domain: The domain
    @type domain: L{IDomain}

    @rtype: C{bool}
    @return: C{True} if the user exists in the domain, 
             C{False} otherwise
    """

And here’s the API documentation that would be generated from that docstring:

Currently, a good deal of Twisted Mail’s methods, functions, and classes are missing docstrings. Even those methods and functions that have docstrings often lack a full description of parameters, return values and their types. My task is to provide that missing information. It’s a great opportunity to dive deeply into the code. And, once the code is fully documented, I’ll find it much easier to fix bugs and introduce new features.

Tutorials

If there were a theme for my first two weeks as an Outreach Program for Women intern, it would be tutorials. As my project, I’m maintaining the mail portion of Twisted, an event-driven, networking framework in Python. I’ll be working to improve the documentation, make progress on partially completed tickets, and attack some new defect and enhancement tickets. Up until now, I had used several aspects of Twisted in a project and submitted a couple of patches, but I hadn’t used the mail subproject and wasn’t familiar with any of the mail protocols that Twisted supports. Working through the mail example code and tutorials seemed like a good introduction and would allow me to evaluate how well the documentation meets the needs of new users. Further, there were a few outstanding tickets related to documentation that I’d be able to work on straightaway.

I found immediate success in running the examples of clients that use the IMAP4 protocol to access email from a remote server. I was also able to run a simple email server that uses SMTP to receive messages, although I had to read the source code to figure out from which address it would accept messages. When it came to the examples of clients which use SMTP to send messages, nothing worked.

SMTP is a text-based protocol in which a client issues a series of commands and receives replies from a server in order to send mail. To investigate the problem with the SMTP client examples, I thought to use telnet to connect directly to the email server and issue the commands manually. Not only couldn’t I connect with telnet to the servers used in the examples, I couldn’t even connect to a Google server. In all of these cases, I was attempting to connect on port 25, which is the well-known port for SMTP. My OPW mentor, Jessica McKellar, theorized that the problem was caused by my ISP blocking outbound traffic for port 25. Verizon’s website confirms that they do this to prevent virus-infected computers from sending spam.

With this problem in mind, I moved on to the tutorial on how to build SMTP clients. I ran into the same problem that had earlier prompted a trouble ticket. The tutorial requires a server running on port 25 of localhost but it does not explicitly mention this requirement nor does it give directions for running such a server. A patch had been submitted to explicitly mention the requirement and suggest that the user could change the example to use an external SMTP server. A review of that patch had found no problem with the change in wording. The review, however, raised an ancillary issue, noting that the output listed in the tutorial did not match what was actually generated by running the examples. This same complaint was documented in another open ticket.

I wasn’t satisfied with the proposed solution, since the user might well be blocked from contacting an external server on port 25. I thought the best solution might be to include directions on running a server locally, to be used with the tutorial. With my limited knowledge of Twisted mail, I modified the example SMTP server to run on port 25 and accept mail from any user. There were some quirky requirements involved in running this server, but I was able to get through the tutorial with it.

Although I had gotten the tutorial working with a local server, I suspected that my solution might not be the most elegant. Rather than submit a patch and upon review learn of a better way to solve the problem, I thought it prudent to consult with the Twisted developer community on the twisted-dev IRC channel. I learned that I could use the twistd utility to start up a mail server with its configuration, including the port, specified through command-line flags.

When I mentioned that I planned to update the output listing in the tutorial in response to both tickets, one developer suggested that was futile since the output would vary based on release and platform and that it would be better just to mention in the tutorial that the listing was representative. He then closed the second ticket, explaining why it wouldn’t be fixed.

In the discussion, one of the developers asked me to amend the first ticket to summarize the problem and specify the completion condition. I specified that the solution was to make the tutorial self-contained so that a user would need no external knowledge to work through it. Then I revised the tutorial to make it so. At the same time, I fixed some issues of formatting, punctuation, grammar, clarity, and accuracy.

Before I could submit a patch, I needed to build the documentation. Similar to the process of building code, tools must be run on the raw form of the tutorial to produce nicely formatted output. Twisted includes a documentation generator tool called lore which translates XHTML input with annotations to indicate, among other things, links to API documentation and code listings into HTML output.

My patch received a quick review. In addition to some requests for minor changes and clarifications, the review included some comments which seemed to suggest that the examples used in the the tutorial might not best represent how to write an SMTP client. I was concerned because the ticket was currently limited in scope to making the tutorial self-contained. Changing the examples, while possibly an improvement, would require significantly reworking the tutorial. In the IRC channel, I asked for confirmation that this was what was being suggested and if so, should it be done as part of the same ticket. The consensus was that the suggestion constituted scope creep and should be addressed in a separate ticket. Now, I am left to submit a few minor revisions to hopefully close out the ticket.

If that weren’t enough in the way of tutorials, I also tackled a new tutorial on building SMTP servers. Comments had been sitting in the ticket for nine months. In addition to incorporating the comments, I edited the document and modified the presentation of some examples to hopefully make them more clear. I found the final example, which is meant to explain how to use the IMessageDeliveryFactory interface, to be lacking and wrote a new example to better showcase it. The changes are nearly ready to be submitted.

Work on these tutorials will continue but my focus is now turning to another area of documentation – the API. More on that to come.