Snapster Report
The Design

S

napster is a generic file sharing system much like Napster. It consists of a centralized server and many client-servers. The central server handles five things: creating new user accounts, user logins, user logouts, keyword searches, and shared file updates. It is designed to perform a minimal amount of work in order to streamline the search mechanism and thereby provide the fastest search results possible. The rest of the load is passed off to the client-servers. In a nut-shell, the client portion of a client-server provides a remote user with an interface to the centralized server and other client-servers. Through this interface, a remote user can connect to the centralized server, upload file descriptors to the centralized server, perform keyword searches on the available list of shared file descriptors, and request to download files from other client-servers. The server portion of a client-server is responsible for sending requested files to other client-servers. On an extremely active system the client-servers work nearly as hard as the centralized server. As I mentioned before, this is good because it allows the centralized server to provide much faster search results.


Platform

        The Snapster system is designed to be platform independent using the Java programming language. While this sounds like a magnificent achievement, it is not entirely true. The Java Runtime Environment required to run Snapster is only available to users of Solaris, Linux, Windows, Tru64 Unix, HP-UX, and Irix. But, six is better than one. Not only that, the client-server interface is an Applet and will run in either Netscape or Internet Explorer, provided that they have the Java 1.2.2 Plugin or higher installed. The Applet is accessible through a webpage which looks for the plugin and attempts to download it automatically, if it is not already installed. Since the Snapster Applet requires special privileges, like file reads and file writes, it will be cryptographically signed by a trusted root authority to unsure its authenticity before running on a remote machine.


Implementation

        The design outlined above is a simple overview of the vast number of features provided by the Snapster system. The remainder of this text will focus on the underlying implementation of these features. I will start with the centralized server.


Figure 1 – The Basic Architecture (Varadarajan Project Spec.)


Centralized Server

        The central server is built on the Java RMI(Remote Method Invocation) architecture. RMI is Sun Microsystems’s version of the more general RPC(Remote Procedure Calls) architecture. RMI allows client machines to execute call functions that reside on the central server as if they were on the local machine. The central Snapster server has five remote methods. Each remote method corresponds directly with one of the central server’s five responsibilities.

         -    Creating new user accounts
-    Authenticating users and upload shared file descriptors
-    Removing shared file descriptors when users logout
-    Performing Keyword searches through the available file descriptors
-    Updating shared file descriptors


Creating New User Accounts

        Before a user can login, he or she must create a new user account. This is necessary so that all the users on the system will have unique usernames and it is required in the project spec. The usernames are used in two ways. They are used as keys to enable quick and easy lookup of user data structures, and they are used to filter search results so that the person submitting a query does not receive his own shared files in the response. When the server is started it checks to see if there is a “valid users” file. If a valid users file exists it loads the serialized Vector into memory(a Vector is simply an array that grows and shrinks dynamically)(the Vector is serialized when it is stored to disk, which means it is flattened in such a way that it can be written to a binary file and re-opened from a binary file). When a client sends a new user request, the server checks to see if that userID has already been taken, in which case it returns false and the client displays the appropriate message to the user. If the userID is not already taken, a new user is created. When a new user is created, the server adds the userID and password to the valid users structure in memory. It then overwrites the valid users file with the new structure so the next time the server is re-started the new user will still be able to login.


Logging In

        After the user account has been created, the user can login and begin sharing files. When a user logs in the client creates a user data structure like this and passes it to the remote login function.

Vector userData

String userid
byte[] password(encrypted)
String hostname
String portnumber
String connection speed
HashMap fileList

        The userid is pulled right from the user structure on the client side, as is the password, which is encrypted with the Rc6 encryption algorithm and stored as an array of bytes. When the client is first started, it spawns a file server thread. The file server thread stores the hostname of the local computer and the first port between 2000 and 9000 that it is able to bind with. Now, in the login function on the client side, the hostname and portnumber are read and copied into the userData structure that is to be sent to the central server. The connection speed can be set by the remote user at any time by accessing the user settings screen in the Smenu along the top of the Snapster interface. Its default value is Modem. Before sending the data structure, the client login function calls a get shared files function. The get shared files function returns a HashMap which maps file names to file sizes.

HashMap fileList

String filename 0    String filesize 0
String filename 1String filesize 1
String filename 2String filesize 2
.
.

        The get shared files function iterates through the shared directory and all its subdirectories. The shared directory can be set in the user settings screen. Its default value is set to the users home directory on which ever operating system the user is using. Now that the client login function has gathered all this information it passes it to the central server. The central server stores each userData structure in a similar way. When it receives the structure it first checks to see if the user is valid and not already online, then it decrypts the password and compares it with the password stored in the valid users structure. If the passwords match the user is in. It then adds the userid and userData structure to a HashMap called onlineUsers.

HashMap onlineUsers

String userid 0    Vector userData 0
String userid 1Vector userData 1
String userid 2Vector userData 2
.
.

        The central server also adds the filename and userid to another HashMap meant to streamline the search mechanism.

HashMap sharedFiles

String filename 0    Vector userList 0
String filename 1Vector userList 1
String filename 2Vector userList 2
.
.

        This structure simplifies the search mechanism in two ways. First, puts all the shared filenames in one place where they can easily be iterated through. Second, the search mechanism can reduce the number of comparisons it has to make by storing a list of users who are sharing the same filename.


Logging Out

        Logging a user out is basically the reverse of logging the user in. First, the central server checks to make sure the userid and password are correct. Then, it removes the userData structure from the onlineUsers HashMap. Finally, it goes through the sharedFiles HashMap and removes the filenames that correspond to shared files listed in the userData structure. Note: it only removes the file mapping if other users are not sharing the same filename. If other users are sharing the same filename, then it simply removes the userid from the userList that corresponds to the filename.


Keyword Searching

        Now that we have the structures taken care of it is easy to see how the searching is accomplished. When a client sends a request to search for a sub-string, the central server builds a temporary array of shared filenames by retrieving the sharedFiles key list. It then sequentially searches each filename for that sub-string. When it finds a match, it iterates through the corresponding userList and grabs their userData structures from the onlineUsers HashMap. An entry is added to the searchResults Vector for each user in the userList corresponding to the file match. The Vector is quite simple a list of String arrays.

Vector searchResults

String [] entry 0
String [] entry 1
String [] entry 2
.
.

String[] entry

String hostname
String portnumber
String connection speed
String username
String filesize
String filename

        Now, when the search finishes the searchResults vector is sent to the user who made the query. The user who made the query takes that vector of entries and formats them for display. That is pretty much all the server does, and while it doesn’t sound like it is the fastest search mechanism in the world, it is pretty darn fast (probably not as fast as Google though : )


Updating Shared Files

        The last responsibility of the server is to track shared files that are added or removed from a users shared files directory. Currently, when a client-server receives a request from another user and that file no longer exists or is no longer being shared, the client-server sends a message to the server telling it that the file is no longer being shared. As for new shared files, currently, the system only adds to the sharedFiles list when a file is successfully transferred. In the future I plan on running a directory monitor on each client’s system, such that if they move files to and from their shared directory, the modifications will automatically be sent to the central server.


Central Server Concurrency Considerations

        The central server has to be able to handle each of its responsibilities in parallel. This is taken care of by the RMI architecture. The RMI architecture handles each remote method in a separate thread so that any number of clients can call remote methods concurrently. There is one more thing to note. The global structures that are accessed by the remote methods must be synchronized. There are three global structures:

Vector validusers
HashMap onlineUsers
HashMap sharedFiles

        The Vectors are automatically synchronized, but the HashMaps are not. The HashMaps are wrapped with a synchronizer class in order to make them synchronized. The reason for doing this is so that only one thread can structurally modify the data structures at a time. If two threads attempt to structurally modify the structure at the same time, there is a possibility that some data could be lost or put in the wrong place. This sounds like it would greatly hinder the performance of the central server, but it really isn’t that bad. Also, there is no way around it. It isn’t that bad because the remote methods grab the information they need and store it locally for as long as they need it so other threads do not have to wait a long time while a single thread is accessing the data structures.


Updating Shared Files

        Most of the client-servers functionality has already been discussed above. The client-servers are merely an interface that enables them to interact with the central server. However, they do act as servers themselves, hence the name client-server. After displaying the list of search results, the remote user can double-click on a result to request a download. When a user double clicks on a search result a file retriever thread is started. The file retriever thread takes the hostname, portnumber, and filename of the file to get and connects to the client-server who is sharing the file (see figure1 above). The client-server who is sharing the file has already started a file server thread (mentioned earlier). When a connection is made to the file server socket a new file sender thread is started. This thread architecture was chosen so that any client-server can send multiple files and retrieve multiple files simultaneously.
CLIENT-SERVER    ----------->   FILE SERVER
|     |     |                   |     |     |     
V     V     V                   V     V     V
FILE RETRIEVERS. . .            FILE SENDERS . . .

        The file sender thread reads the requested file name from the file retriever and sends a yes or no response to indicate whether the file exists. If the file exits, the file sender sends the file size and automatically begins sending the file. The file retriever reads the file size and automatically begins retrieving the file. The file sending and retrieving mechanism is quite interesting. The file input stream, socket output stream, socket input stream, and file output stream are all buffered. So, when a client-server is reading from an input stream it will attempt to read as much information as it possibly can. And, when a client server is writing to an output stream it will buffer the information before sending it. This streamlines the file sending mechanism quite a bit, especially when reading and writing to files.


Future Considerations

        In the future I plan on allowing users to limit the number of simultaneous uploads and downloads. Also, I plan on allowing users to clear finished uploads and downloads as well as cancel uploads or downloads. I would also like to implement a file resume feature. Otherwise that’s it. If you have any further questions please do not hesitate to contact me at chad@chadlangston.com.