Setting up Dovecot, the IMAP Server

from en_Source_...rch 2014

Caching Django websites with Memcached

Admin How To

Setting up Dovecot, the IMAP Server

A mail server is a computer on the network that acts as a virtual post office for emails. In the previous article published in February 2014, the author had explained how to set up an email server on Gentoo Linux using Postfix and Dovecot. This part guides readers on how to configure Dovecot, the IMAP server.

In Gentoo (and probably every other distro), details about Dovecot configuration are available in /etc/dovecot. The directory contains a few files and a conf.d directory for extra configuration of various aspects of the server. The configuration files are well documented with comments.

The main dovecot.conf file is something like this:

protocols = imap lmtp listen = <ip>, 127.0.0.1 login_greeting = ABC mail service verbose_proctitle = yes shutdown_clients = yes !include conf.d/*.conf

The protocols line specifies the protocols it must serve. It's been set to LMTP and IMAP. LMTP, as described in the first article in this series, is the local mail delivery protocol used by Postfix to transfer mails to Dovecot. You can add one more protocol there—the traditional POP3. But in the age of mobile devices and easily accessible email, I don't think anyone really uses POP3 as it involves downloading everything to a single machine.

You should put your public IP and localhost there. If you want to offer only a Web mail service, you can leave out the public IP. Dovecot must listen on 127.0.0.1 because that's where our Web mail client (Roundcube) will connect. We'll look into the configuration of Roundcube later.

The login greeting is nothing specific, so use anything you like. It is a protocol level greeting message, which is not seen or shown by most (or all) clients that interact with a mail server.

Verbose proctitle: As the documentation in the configuration file says, the verbose_proctitle option shows mailbox information in process names in ps (the process status command), which is automatically available in tools like top/htop. In a virtual mail setup, it will be hard to distinguish the load offender when only the username and IP is shown. I recommend enabling this.

Shutdown clients: This is a rather debated setting— whether or not Dovecot should kill client connections when the master process shuts down. If this is enabled, for a short period of time during upgrade, the mail server will be unavailable. If it is disabled, it will be available throughout—but existing processes (open connections) will not get the update. What happens if a security fix is missed out as a result? I prefer security to availability, so recommend that this is enabled.

Now, in the same directory, we have dovecot-sql.conf.ext. In this file, Dovecot is configured to access the SQL database. The same connection configuration (only the connect option) must be specified in the beginning of dovecot-dict-sql.conf as well (which is used for expire and quota plugins).

driver = pgsql connect = host=/run/postgresql dbname=mail user=mail password=<password> default_pass_scheme = SHA512-CRYPT password_query = SELECT * FROM active_users_passdb WHERE user

= '%u'; user_query = SELECT * FROM active_users_userdb WHERE user = '%u'; iterate_query = SELECT user FROM active_users_userdb;

default_pass_scheme: This is the default password hashing method to be used. SHA512-CRYPT is the highest possible algorithm supported by Dovecot on most Linux distributions at the time of writing this. It supports BLFCRYPT as well, which uses the highly secure BCRYPT algorithm, but that requires a patched glibc installation. password_query – The SQL query that Dovecot must use to authenticate a user. user_query – The SQL query for fetching user information. iterate_query – The SQL query for pre-fetching users. This is used by Dovecot when we run the mail indexer.

Authentication configuration: This is done in conf.d/10auth.conf.

disable_plaintext_auth = yes auth_mechanisms = plain login !include auth-sql.conf

PLAIN and LOGIN are the most commonly used authentication mechanisms. With the first option, plaintext authentication over cleartext (non-encrypted connection) is disabled. You can enable it if needed.

In auth-sql.conf, we just need the following:

passdb { driver = sql args = /etc/dovecot/dovecot-sql.conf.ext }

userdb { driver = prefetch }

userdb { driver = sql args = /etc/dovecot/dovecot-sql.conf.ext }

In 10-mail.conf there are various settings to be configured, but the most important ones are:

mail_location = mdbox:/var/vmail/%d/%n mail_privileged_group = vmail mail_fsync = optimized mail_plugins = expire fts fts_lucene quota trash virtual

mail_location sets the path on the filesystem to store emails. Mdbox is a format created by Dovecot itself to overcome performance related and other issues with old storage formats like mbox and maildir. mail_plugins enables various plugins for all the protocols.

Logging can be configured in 10-logging.conf. Configure it according to your needs. But temporarily, while the server isn't ready for production yet, enable the following options:

log_path = syslog auth_debug = yes auth_verbose = yes mail_debug = yes

This will help in debugging any issues with Dovecot. If you don't have syslog, you can set it to a filename.

In 10-master.conf, ports and protocol mapping are configured:

service imap-login { inet_listener imap { port = 143 ssl = no

} inet_listener imaps { port = 993 ssl = yes }

service_count = 0

vsz_limit = 256M }

service lmtp { unix_listener /var/spool/postfix/private/dovecot-lmtp { group = postfix user = postfix mode = 0600 } }

service auth { unix_listener auth-userdb { mode = 0600 user = vmail group = vmail }

unix_listener /var/spool/postfix/private/auth { mode = 0660 user = postfix group = postfix

user = dovecot }

service dict { unix_listener dict { mode = 0660 user = vmail group = vmail } }

Let Dovecot listen on Port 143 for cleartext connections. There's no point in encrypting for clients connecting from the same machine (the Web mail client, i.e., Roundcube). You can block the plaintext Port 143 using iptables so that nobody from the Internet connects via the cleartext protocol.

The service LMTP and service auth are interesting parts in the above configuration. In the LMTP section, Dovecot is configured to listen for LMTP connections at a UNIX socket path. We'll use the same path in Postfix configuration –- it tells Postfix where to deliver the mails via LMTP.

Postfix is the SMTP server, but we need user authentication. Postfix must be configured to use Dovecot's authentication mechanism because we are storing encrypted passwords in the database. Postfix supports Dovecot-SASL. For the same reason, we have configured the Dovecot service auth to listen on a UNIX socket for connections.

In 15-lda.conf, we need the following settings:

recipient_delimiter = + lda_mailbox_autocreate = yes lda_mailbox_autosubscribe = yes

protocol lda { mail_plugins = $mail_plugins sieve }

In the first article in this series, we had created a function sender_bcc_map which outputs username+Sent@domain for input username@domain. In the above configuration, the recipient_ delimiter option specifies that the email address should be split by +, and the part after + should to be treated as the destination folder name. This is something similar to Gmail wherein we can use any number of email aliases, but everything gets delivered to the inbox and filters need to be set up manually. Our mail server does the filtering automatically.

The Sieve plugin is loaded for the LDA protocol –- it cannot operate on other protocols. Sieve is the RFC defined standard language for mail filtering.

In 20-imap.conf, we need to load the anti-spam plugin, as follows:

protocol imap { mail_plugins = $mail_plugins antispam

Similarly, in 20-lmtp.conf, load the Sieve plugin for the LMTP protocol.

The ManageSieve protocol configuration titled 20-managesieve.conf is as follows:

protocols = $protocols sieve

service managesieve-login { inet_listener sieves { port = 4190 ssl = yes

inet_listener sieve { port = 4191 ssl = no

} service_count = 0 vsz_limit = 256M }

This instructs Dovecot to enable the ManageSieve protocol, with which users configure Sieve filter scripts by themselves. This is required if you want the user to be able to configure filters using Roundcube or other Web mail clients and/or desktop clients like Thunderbird. Since security is important, we'll use two ports for Sieve. The new standard for Sieve says that it is on Port 4191, so it should be open to the public and have SSL. The other port, 4191, will be used for local Web mail client connections.

Coming to the plugin settings in 90-plugin.conf, we need to configure four plugins -– Fulltext Lucene Search, Trash, Expire and Antispam:

plugin { fts = lucene fts_lucene = whitespace_chars=@. trash = /etc/dovecot/dovecot-trash.conf.ext expire = Trash expire2 = Trash/* expire3 = Junk expire4 = Junk/* expire_dict = proxy::expire

antispam_backend = spool2dir antispam_allow_append_to_spam = yes antispam_spam = Junk antispam_trash = Trash antispam_spool2dir_spam = /var/lib/dovecot/antispam/ spam/%%lu antispam_spool2dir_notspam = /var/lib/dovecot/antispam/ ham/%%lu }

The trash plugin is useful when quotas are enabled – it will automatically delete messages from folders when a new

incoming message cannot be saved because it exceeds the quota. In dovecot-trash.conf.ext you can configure the priority and folder name it should delete. First, the lower priorities are deleted.

The Expire plugin helps in running a cron job for deleting old emails from the Trash and Junk folders. You can read more about this at http://wiki2.dovecot.org/Plugins/Expire

We'll be using SpamAssassin for filtering spam and, in fact, Antispam does have a SpamAssassin backend. But when I was setting up the server, the SpamAssassin mode didn't work properly and it kept causing crashes.

The spool2dir backend copies an email to the specified folders when it is marked as spam or ham (not spam) by a user. This enables us to have a learning-filter kind of setup, where SpamAssassin learns from user activity. We'll look at how to set up SpamAssassin to learn about the mails from those directories when we configure SpamAssassin.

The Quota plugin backend needs to be configured in 90-quota.conf:

plugin { quota = dict:User quota::proxy::quota }

We use the SQL database, so specify the user dictionary.

In order to properly filter incoming spam, we need a Sieve filter that executes before all filters. SpamAssassin marks emails with a special spam header. The Sieve filter will check every incoming mail for the header and, if required, move the mail accordingly to the spam folder.

Sieve configuration titled 90-sieve.conf is given below:

plugin { sieve_before = /var/lib/dovecot/sieve/before/ recipient_delimiter = + }

In 10-auth.conf, we included auth-sql.conf. The passdb and userdb of Dovecot need to be configured to use our SQL configuration:

passdb { driver = sql args = /etc/dovecot/dovecot-sql.conf.ext }

userdb { driver = prefetch }

userdb { driver = sql args = /etc/dovecot/dovecot-sql.conf.ext }

That's it! Dovecot configuration is now complete. So let's create a domain and a user in our database. First, we need to encrypt the password of the user before insertion. For this, we'll use Dovecot's doveadm utility:

doveadm pw -s SHA512-CRYPT Enter new password: Retype new password: {SHA512-CRYPT}$6$zK6YFoQ/Axi8jlaw$Vbp0n69fBCp6bVE2lNVmrjRmYZr AA5nb1mwgwinRO1iWSe/i.q9sWTO1qw62eEdLY0MLzlgRJFEYMtFYrXSY4/

The part after {SHA512-CRYPT} in the output is the hash to be inserted in the database.

psql mail mail=# insert into domains (name) values ('accessiblehawk. com'); mail=# select * from domains; mail=# insert into users (domain_id, name, password, quota_kbytes) values (1, 'nilesh', '$6$zK6YFoQ/Axi8jl aw$Vbp0n69fBCp6bVE2lNVmrjRmYZrAA5nb1mwgwinRO1iWSe/i. q9sWTO1qw62eEdLY0MLzlgRJFEYMtFYrXSY4/ ', 0);

We can now test the server using Telnet:

# telnet localhost 143 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS AUTH=PLAIN AUTH=LOGIN] mail. accessiblehawk.com Accessible Hawk E-Mail Service 1 LOGIN nilesh@accessiblehawk.com foo 1 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS SPECIAL-USE BINARY MOVE SEARCH=FUZZY NOTIFY] Logged in 2 LOGOUT * BYE Logging out 2 OK Logout completed. Connection closed by foreign host.

In the next part in this series of articles, we'll look at how to configure Postfix and other parts. By the way, I've deleted my account on the server, so don't attempt a login with the password ‘foo' ;-)

By: Nilesh Govindrajan

The author is a student of engineering in Pune and co-founder of Accessible Hawk, a company dealing with Web hosting, email and virtual machine services. He can be contacted at me@ nileshgr.com or @nileshgr on Twitter.

RAMCloud:

The Future of Storage Systems

In RAMCloud, data is stored in the DRAM of thousands of computers in a data centre. RAMCloud offers quick and reliable recovery even though terabytes of data may be stored in the system.

Today, the amount of data generated on the Internet is enormous. An application like Facebook needs to deal with petabytes or terabytes of data without compromising its performance. RAMCloud is a next generation storage system that can deliver high performance with the help of just commodity hardware, even at current levels of storage complexity. It stores data entirely in DRAM (main memory), and the disk takes the role of backup or archival storage. Since the data always resides in the main memory, it can avoid the access latency that is usually incurred in a disk-based storage system and, hence, provides high throughput, which is the key to better performance.

The current scenario

For the past four decades we have seen rapid growth in computer hardware technologies, which has helped improve the efficiency of storage systems. The processor, memory and disk play an important role in the performance of a storage system. Currently, an imbalance in the performance of any one of these components can impact the whole system. Though there has been tremendous improvement in the performance of both memory and processor, the disk has not been able to keep pace. Disk capacity has increased by more than a thousand times, but the transfer rate for large blocks has improved only fifty-fold, while seek time and rotational latency have only improved two-fold.

Large applications like Facebook and Amazon require multiple access points to storage servers to generate a single page. Due to the high access rate the performance of these applications is reduced. Applications use cache to overcome disk latency, but cache must have an exceptionally high hit rate to provide significant performance improvement. Even a 1 per cent cache miss can severely affect the system’s performance, which is not acceptable for some applications.

Flash memory is another storage system that offers latency lower than disk. But Flash devices are I/O devices, so apart from the access latency, they have additional latencies of device drivers and interrupt handlers. These shortcomings of current storage systems demand a new improved storage approach.

An overview of RAMCloud

RAMCloud is a storage system that stores data in the DRAM of thousands of servers within a data centre, as shown in Figure 1.

Since the information is kept in the DRAM at all times, the access rate is very low, and provides 100-1000x lower latency than disk-based systems and 100-1000x greater throughput. Most Web applications grow over a period of time and will require more servers to store their data. RAMCloud will scale automatically to support the growing number of servers added to the system.

RAMCloud uses DRAM, which has volatile memory, i.e., the data is lost when the power is removed. However, applications require storage systems to provide a high level of data durability and availability. RAMCloud uses a technique called buffered logging to maintain durability. In this approach, a single copy of each data object is stored in the DRAM of a primary server on the disks of two or more backup servers; each acts as both primary and backup server. When a write operation is performed, the primary server updates its DRAM and forwards log entries to the backup servers, where they are stored temporarily in the DRAM of the backup server. The backup server collects log entries into batches that can be written efficiently to a log on disk. Once log entries have been written to disk they can be removed from the backup's DRAM.

Buffered logging allows both reads and writes to proceed at DRAM speeds while still providing durability. Power failures can be handled by committing each write operation to a stable storage.

The data model

The data model for a storage system governs how data is collected, stored, integrated and put to use. There are three main factors that we need to decide on, prior to selecting the type of data model: The nature of the basic objects stored in the system. How basic objects are organised into higher-level structures; for example, we can either just have key-value pairs or some sort of aggregation. Select the methods for naming and indexing of objects when retrieving or modifying objects.

The two common types of data models are the highly structured relational data model and the unstructured data model. RAMCloud prefers an intermediate approach where servers do not impose structure on data but do support aggregation and indexing. It supports any number of tables, each stores multiple objects and these objects are stored as simple key-value pairs. It also provides a simple set of operations for creating and deleting tables, and for reading, writing and deleting objects within a table.

Research challenges

Numerous challenges need to be fixed for RAMCloud

1000 – 100,000 Application Servers

Appl. Appl. Appl. Appl. Library Library Library Library

Commodity Servers Datacenter Network

High-speed networking:  5 µs round-trip  Full bisection Bandwith

Coordinator

Master

Backup Master

Backup

1000 – 100,000 Storage Servers

32-256 GB Per server

Figure 1: RAMCloud architecture

Figure 2: Buffered logging

to be implemented successfully. Given below is a short description of various challenges that researchers are trying to solve.

Consider the applications that use TCP/IP protocol - they have round-trip times for remote procedure calls and high latency in network switches. Also, the flow oriented feature of TCP is of little use to RAMCloud, since individual requests will be relatively small. To improve overall latency, we can either modify or replace the TCP protocol with a UDP protocol. An increasing number of applications are using virtualisation for greater portability. This increases the overheads, since an incoming packet will now need to pass through the virtual machine monitor and a guest operating system before reaching the application, thereby increasing the overall latency. So we need techniques like passing packets directly from the virtual monitor to the application to reduce this overhead.

RAMCloud is implemented by using a large number of servers, but the applications that use them, must see this as a single storage system, i.e., the application must be oblivious to the distribution of the storage system. The primary issue in the distribution and scaling of the system is data placement. An object may be required to be moved to another server to improve the performance. This data movement needs to

happen automatically and in real time.

A single RAMCloud system can be used to support multiple applications of varying sizes. It should provide a security mechanism to support mutual hostile applications. Also, one application with a very high workload must not degrade the performance of other applications.

Finally, RAMClouds must manage themselves automatically. There are thousands of servers, each using hundreds of peers, which makes the overall design too complicated to be handled by humans.

Why use RAMCloud?

We believe there are two main motivations for using RAMCloud.

Application scalability

Most Web applications use relational databases to store their data. As the application grows, it becomes difficult to store the entire data in a single relational database. Applications then use other techniques to manage their data. A popular technique is ‘Ad-hoc’, where data is partitioned among multiple databases. As the application grows larger, maintaining consistency among multiple databases becomes increasingly difficult and requires more complex techniques to overcome these issues. Another storage technique is ‘Bigtable’, which is built on top of a Google file system. Because of the distributed nature of a Bigtable database, performing certain database operations like a join between two tables would be terribly inefficient. On the other hand, RAMCloud will automatically scale to support the increasing number of storage servers used by an application.

The technology

The disk is used as a primary storage system for Web applications. Accessing large blocks at one time from a disk may be beneficial when compared to accessing small blocks. However, most forms of online data, such as images and songs, do not comprise large blocks. So the latency for accessing the smaller more frequent blocks of memory is high. Large Web applications need to make multiple internal requests to generate a single HTML page. So we need to consider the cumulative latency of all the requests while considering the overall response time to users. One of the major advantages of RAMCloud over the diskoriented approach is that it can dramatically reduce the access latency of a request and, thereby, reduce the overall response time. RAMCloud also supports a new class of data-intensive applications, which process data in large volumes—typically, in terabytes.

The pros and cons

Here are some of the pros and cons of using RAMCloud.

The pros

1. Since all the information is stored in DRAM, RAMCloud provides high throughput. 2. RAMCloud automatically scales to support a large number of storage servers and eliminates the scalability issues in applications. 3. It provides high level of data durability and availability. 4. The cost of storing data on DRAM today is the same as storing data on a disk ten years ago. 5. RAMCloud supports a log-structure similar to a log structured file system for all its data on DRAM as well as on disk. This provides fast crash recovery. 6. RAMClouds are 100-1000x more efficient than disk-based systems in terms of cost per operation or energy per operation.

The cons

1. It involves a higher cost/bit and high energy/bit, so

RAMCloud storage will be 50-100x more expensive than a pure disk-based system. 2. Maintaining consistency for applications that require replication across data centres is very difficult.

With the growth of large scale Web applications, there has been a need for alternative disk storage technologies. Both Google and Yahoo store their search indices entirely in DRAM. The Bigtable storage system allows entire column families to be loaded into memory, where they can be read without any disk accesses. We believe that RAMCloud is a long-term solution for the storage needs of Web applications. RAMCloud provides durability and very low latency. Hence, it enables richer query models and is attractive for technologies like cloud computing. It is able to aggregate the resources of a large number of commodity servers. However, a lot of research needs to be done and numerous challenges must be overcome in order to use this technology.

References

[1] https://ramcloud.stanford.edu [2] http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf

Acknowledgements

I would like to thank Dr John Ousterhout, professor of Computer Science at Stanford University. He is the lead at the RAMCloud project at Stanford University. I would also like to thank my mentor and all the people who helped me to review this article.

By: Sakshi Bansal

The author is in her fourth year of the Computer Science and Engineering bachelors’ degree at Amrita Vishwa Vidyapeetham, Amritapuri. She is a FOSS enthusiast and an active member of the Amrita FOSS club, having made contributions to various open source projects such as Mozilla Thunderbird, Mediawiki, etc. She blogs at http://sakshiii.wordpress.com/.

Setting up Dovecot, the IMAP Server

Next Article

Caching Django websites with Memcached

Admin How To

Setting up Dovecot, the IMAP Server

RAMCloud:

The Future of Storage Systems

The current scenario

An overview of RAMCloud

The data model

Research challenges

1000 – 100,000 Application Servers

1000 – 100,000 Storage Servers

Why use RAMCloud?

Application scalability

The technology

The pros and cons

The pros

The cons

References

Acknowledgements

More articles from this publication:

Caching Django websites with Memcached

Set Up a Reverse Proxy in Apache

Rainmail Intranet Server A Complete IT Set-up for the Enterprise

Monitoring Log Files with Nagios Plugins

Boost Your Employability with Hadoop Skills

CIO Talk: “We are probably a

Use Maxima for the

Create an Android-friendly Hotspot in Linux

Emerging Technologies Enhanced by Open Source

This article is from:

en_Source_...rch 2014