Deploying RStudio Server for Classrooms
Overview
Deploying RStudio Server for use in classrooms has a number of compelling benefits:
- RStudio's ease of learning and use reduces the overall learning curve for R
- The friction and ongoing support required to ensure that all students have a fully functional (and uniform) R setup on their own computers is eliminated.
- Packages and other external tools and libraries (e.g. TeX) can be installed once on a server and then be immediately available to all students.
- Students can access their RStudio session from any computer, making it extremely straightforward to transition between lab and personal computers.
One of the challenges associated with deploying RStudio for classroom use is ensuring that individual student sessions occupy minimal resources (so that hundreds or even thousands of students can be supported by a single server). This article covers general advice on deploying the server for classrooms as well as some specific tips for ensuring that large numbers of students can use the system.
Server Platform
RStudio Server is supported on a number of platforms including Debian, Ubuntu, and RedHat/CentOS. While the supported feature set across platforms is more or less the same, there are some additional benefits to running on Ubuntu, which we recommend for classrooms. This is for several reasons:
-
We host a beta version of RStudio Server on our own servers and our deployment platform is Ubuntu Server. This means that we have greater familiarity with Ubuntu, detect and resolve problems on Ubuntu sooner, and tend to implement robustness oriented features for Ubuntu first.
-
On Ubuntu we integrate with the Upstart service, which ensures that RStudio Server stays running even if unexpected problems occur.
-
On Ubuntu we integrate with AppArmor, which provides an additional layer of security for the server.
Getting Started
Installation
To install RStudio Server follow the instructions in the Getting Started document. Once you have verified that you can connect to the server you can proceed with additional configuration.
Using a Proxy Server
When deploying RStudio Server for access over the internet (as opposed to on an internal network) we strongly recommend using Apache or Nginx to proxy requests to RStudio Server. The specifics of doing this are covered in the Using a Proxy Server. The benefits of using an HTTP proxy include:
- Improved network performance -- Apache and Nginx both implement HTTP keep-alive, which significantly improves performance especially on slower or saturated connections.
- Improved security -- Apache and Nginx also both have facilities for connection-throttling and rejection of potentially malicious malformed requests. They also keep a log of all http requests which can be useful for troubleshooting problems after the fact.
- Sharing the server between several applications -- when using a proxy server it is possible to configure RStudio to be accessed over port 80 using a custom path, for example: http://myserver.edu/rstudio. This enables you to host RStudio Server on a machine also configured to host other web content or applications.
Authenticating Users
By default RStudio Server authenticates users based on the
system user database. So if a user has an account on the server
where RStudio is installed they can login using their system
username and password (passwords are encrypted during
transmission). You can manage these users using the standard Linux
user administration tools like useradd, userdel, etc. Note that
each user needs to be created with a home directory. This can be
done with the -m option such as useradd
-m.
The underlying protocol used by RStudio for authentication is Linux PAM (Pluggable Authentication Modules). Using PAM, RStudio can also be configured to authenticate users against LDAP or ActiveDirectory servers, or other custom authentication schemes. Note however that RStudio does not currently implement the PAM Session API, so if your local PAM configuration requires the PAM Session API (e.g. for mounting home directories from an NFS share) then RStudio won't work as expected.
Optimizing Your Configuration
Overview
To support large numbers of students RStudio Server should be configured to limit the total resources that can be consumed by individual student sessions. Assuming that students aren't running long computations a single server with 4 CPUs should be able to support a large number of concurrent students.
This section describes several of the more useful configuration settings in an environment with a potentially large number of student users. Before proceeding you should make sure you've reviewed the basic RStudio Server documentation on Configuration and Server Management.
Each of the entries described below is prefaced with the configuration file it should be added to (RStudio Server has two different configuration files, /etc/rstudio/rserver.conf and /etc/rstudio/rsession.conf).
/etc/rstudio/rserver.conf
You can set a maximum number of megabytes of memory that may be allocated by each R session:
rsession-memory-limit-mb=1000
You may also want to limit the number of processes which can be simultaneously created within a session. This setting is highly recommended as it prevents users who have inadvertently executed a command that creates a large number of processes from negatively affecting other users.
rsession-process-limit=30
/etc/rstudio/rsession.conf
You can specify the number of minutes of idle time before RStudio suspends an R session. After being suspend the session no longer takes up any resources on the server. The session is automatically resumed the next time the user accesses it.
session-timeout-minutes=10
If you are using XFS disk-quotas then RStudio can be configured to check the user's disk usage periodically and warn them if they are either over or getting close to their quota:
limit-xfs-disk-quota=1
You may also want to specify a maximum size for uploaded files. You can do this using:
limit-file-upload-size-mb=50
If you are particularly concerned about the potential for CPUs being tied up in long computations (perhaps even by mistake) then you can also set a maximum number of minutes for top-level console commands:
limit-cpu-time-minutes=10
This setting will result in RStudio calling the setTimeLimit function right before it processes each new console input from the user.