My latest Twisted adventure began with a comment I came across in
1 2 3 4 5
This seemed like a worthy problem to investigate so that, at the very least, I could write a ticket to track the issue.
The first challenge was to set up a smart host configuration with Twisted. A smart host is a mail server which accepts mail to any address and then determines the mail exchange for the address and connects to it to relay the mail. Unlike an open relay, a smart host imposes restrictions on the source of messages. While some may accept mail only from authenticated senders, Twisted’s default is to relay any mail received over a Unix socket or from localhost.
It was easy enough to run a smart host on my development machine. I just had
twistd mail with the relay option and specify a directory to hold
messages to be relayed:
The smart host uses DNS to look up mail exchanges and
contacts them via SMTP on port 25. Because my ISP does not allow outgoing
traffic on port 25 and because I did not want to relay test messages to
real mail servers, I needed to make some changes to the Twisted source so that
the email messages would be relayed to a Twisted mail server that I ran on a
second computer. I modified
relaymanager.py to relay to port 8025 and to
use a hosts file for DNS resolution.
1 2 3 4 5 6 7 8 9 10 11 12
The hosts file maps
example.net to the IP address of the
computer running the target mail server.
I configured that server to run on the default port, 8025,
and accept mail for a few users on the domains
When I used telnet on the development machine to send mail to the smart host running on the same machine and addressed it to one of the configured users on example.com or example.net, the smart host relayed it to the mail server on the second machine.
Now that I had a usable configuration, I wanted to explore the implications of
the comment that
RelayerMixin opened a large number of files and never closed
RelayerMixin is used to introduce a set of functions for relaying mail
to another class, a relayer, through inheritance. On initialization, the
relayer calls one of the
loadMessages, with a list
of the pathnames of messages which it is responsible for relaying.
loadMessages opens each message file and stores the file object in a list.
I hypothesized that if I sent a lot of messages to the smart host at once, its
relayers would open files for all the messages and hit the operating system
limit for open files.
I wrote a short program to send the SMTP commands for a series of messages to the smart host running on port 8025 of the same machine. The messages are randomly destined to one of two addresses on each of the two domains served by the mail server on the other machine.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
As I increased the number of messages sent, I expected to eventually see an exception occur when too many files were opened but that did not occur no matter how many messages were sent. From the server log, I observed that instead of opening one connection to the mail server for each domain and sending all the queued messages for that domain, the smart host was repeatedly connecting to the mail server and sending no more than a few messages at a time. That explained why the limit on open files was not being reached. The relayers were being handed only a few messages at a time so there was no need to open a lot of files at once.
This strategy for allocating work to relayers did not seem very efficient so I
started exploring further.
SmartHostSMTPRelayingManager, which implements
the smart host functionality, has a function,
checkState, which is called
periodically to see if there are messages waiting to be relayed and if there is
capacity to create new relayers. If so, it calls
_checkStateMX to create
relayers and allocate messages to them. It turns out that
contains a subtle bug which is the cause of the allocation behavior.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
_checkStateMX asks the relay queue for a list of waiting messages. Then it
loops through the messages, grouping them by target domain. Eventually, each
group will be handed off to a relayer. The problem is that
breaks out of the loop as soon as it has at least one message for the maximum
number of domains it can concurrently contact. That value,
is an optional parameter to
default value is 2.
_checkStateMX loops through the waiting messages, it creates a list of
messages for the first domain it sees and keeps adding messages for that domain
to the list. When it sees a second domain, it creates another list for that
domain but since it has hit the limit on connections, it breaks out of the
loop. So, any other messages in the queue for either domain must wait to be
sent even though they could be handled by the same relayers. Instead of
breaking out of the loop when it reaches the connection limit,
should continue to add messages to the lists for the domains it has already
seen and ignore messages for other domains.
With the understanding of how messages are allocated to relayers, I was now easily able to trigger an exception for too many open files by sending a large number of messages to one domain instead of splitting them between two.
As a result of this exploration, I filed and submitted fixes for two issue
tickets, a defect ticket for the handling of open files
RelayerMixin, and an enhancement ticket to improve
how messages are allocated to relayers.