Introduction to Linux

What is Linux

It is a free and open source operating system released in 1991 under the GNU GPL license.GPL allows anyone to use, modify and redistribute with the requirement that they pass it on with the same license.

It is the leading operating system of choice for servers such as supercomputers. More than 90% of the top 500 fastest computers are based on Linux.

MAC computers are related to Linux because they are also based on UNIX

Depending on the purpose of the Linux machine, it may or may not have a Desktop environment that we are familiar with on our personal computers. Linux uses X Window System to provide the Desktop environment.

A popular distribution of Linux operating system is called Ubuntu.

Why do bioinformaticians use Linux?

  • Many bioinformatics core tools are written in Linux.
    • BLAST, CLUSTALW, PHRAP, etc
    • Many web applications are also supported on web servers hosted on linux machines
  • Linux supports development of software for many different programming languages.
    • Developers are lazy so creating a software that does not require a window is much faster and easier
  • Multiple users can log in at the same time.
    • A user logging in over the network can do just about anything a user sitting in front of the computer can do. Which also means linux handles multitasking very well.

Remote vs. Local

Logging in with X Windows

The standard user interface for personal computers is a GUI (Graphical User Interface). However for linux it is a command-line interpreter called shell. It is simply a prompt the awaits your command. There are several different shells, but the one used often is called “bash”, which is a mixture of a bunch of other shells.

In cases where a program requires a GUI, you should log in using the  –X option. This opens a tunnel to your computer allowing all windows to open in your computer. For this to work you need X11 installed on your computer (MobaXterm already has one) MAC – Xquartz (http://xquartz.macosforge.org/landing/) Windows – Xming (http://sourceforge.net/projects/xming/)

172-16-140-235:~ manpreetkatari$ssh -X msk8@prince.hpc.nyu.edu
msk8@prince.hpc.nyu.edu's password:
Last login: Mon Dec 18 12:47:18 2017 from 172-17-11-36.dynapool.nyu.edu
[msk8@log-1 ~]$ emacs

You should have a window popup on your computer that looks something like this.

This application was launched on the server but is using your window system to show it on your computer.

Simply close the window to exit.

Home Sweet Home

When you first log in, you will be in a directory called “home directory

/home/netid

Generally in this directory you have complete control over creating, modifying, and executing files in this or any sub directory you create. In order to return to your home directory simply type the command:

cd ~

at the prompt. Unless appropriate changes have been made you can can not enter anyone’s directory or even see what is in it.

Command Line Editing

The command is only executed once you press enter. Till then you can edit the line by using the following key strokes:

Action Result
Backspace (delete on MACs) delete previous character
Left Arrow, Right Arrow move left and right on lines
Up Arrow, down Arrow previous and following command
Ctrl-A go to front of line
Ctrl-E go to end of line
Ctrl-D delete next character
Ctrl-K delete everything to the right of the character
Ctrl-Y paste
Ctrl-C stop a running job

Once you press enter the program will be executed. When your prompt returns, you know that the program has finished. If there is an output to the program it usually prints it on the screen (often referred to as the standard output)

In the example below, date is a command that is being executed with no arguments. Many commands/programs have options that are provided immediately following the command. In the ls -lexample, ls is the command and everything else are options that are provided.

[msk8@log-1 ~]$ date
Fri Jan  5 18:03:19 EST 2018

[msk8@log-1 ~]$ cd /scratch/msk8/
[msk8@log-1 msk8]$ ls -l
total 300
drwx------+  4 msk8 msk8   4096 Aug 28 09:37 AG2017
drwxr-xr-x+  2 msk8 cgsb   4096 Jun  9  2016 alignment
drwx------+  2 msk8 cgsb   4096 Apr 17  2017 alignment2017
drwx------+  2 msk8 cgsb   4096 Jun  9  2016 AppliedGenomics
drwxr-xr-x   3 msk8 cgsb   4096 Jun  9  2016 bfx2016

Directing standard output

Instead letting the output print to the screen we can save it to a file by using the > sign and then giving the file name. This will replace a file if it already exists without a warning. To append use an existing file use ». It is important to mention here that once you overwrite a file, it is deleted. It is gone. There is no recycling bin to restore from trash.

The following command gets details about all users’ home directories and saves them into a file called allusers.txt

[msk8@log-1 msk8]$ ls -l /home/ > allusers.txt
[msk8@log-1 msk8]$ ls -al allusers.txt 
-rw------- 1 msk8 msk8 190798 Jan  5 18:05 allusers.txt

Command-line completion

In some cases the commands or the file names that you need as arguments can be very long which increases the chance of spelling mistakes.

To prevent such mistakes simply type the enough letters to unambiguously identify the command or file and then pressing tab will complete it for you.

In the case you don’t know how many letters you need, simply press tab twice to see all your options.

In the example below, after typing the command and its options, the tab key was pressed twice to get this. The command will not be executed until the  enter  key is pressed.

[msk8@log-1 msk8]$ ls /usr/bin/bz
bzcat         bzcmp         bzdiff        bzgrep        bzip2         bzip2recover  bzless        bzmore

Wildcards

In cases where you want to refer to multiple files you can use * to represent any characters of any length. You can also use ? To represent any character of one length. In the example below, the first line gives all files/programs that start with bz. The second only gives which begin with bz and three letters afterwards, represented by ?

[msk8@log-1 msk8]$ ls /usr/bin/bz*
/usr/bin/bzcat  /usr/bin/bzdiff  /usr/bin/bzip2         /usr/bin/bzless
/usr/bin/bzcmp  /usr/bin/bzgrep  /usr/bin/bzip2recover  /usr/bin/bzmore

[msk8@log-1 msk8]$ ls /usr/bin/bz???
/usr/bin/bzcat  /usr/bin/bzcmp  /usr/bin/bzip2

Finding Your Way

Often you will get lost on the hpc and you will need to know where you are, which computer did you log into, or even which account have you logged into. Below are some simple commands that help you find your way.

[msk8@log-1 msk8]$ whoami
msk8
[msk8@log-1 msk8]$ pwd
/scratch/msk8
[msk8@log-1 msk8]$ hostname
log-1

File manipulation

Useful commands for manipulating files and directories. To get details about how to use the commands type man <command>.

Command Action
mkdir make a directory
rmdir remove a directory (only works if the directory is empty )
cd change directory
pwd present working directory
ls list of files and directories in the directory. You can use wild card to look for specific files. You can also use -l to see details such as permission for files and directories
cp copy a file and/or directories. Use -r to recursively copy.
mv move a file. It will copy and then delete the source. This can be used to rename files as well.
rm remove a file
[msk8@log-1 msk8]$ mkdir temp
[msk8@log-1 msk8]$ cd temp/
[msk8@log-1 temp]$ ls
[msk8@log-1 temp]$ cp ../al
alignment/     alignment2017/ allusers.txt   
[msk8@log-1 temp]$ cp ../allusers.txt ./
[msk8@log-1 temp]$ ls
allusers.txt
[msk8@log-1 temp]$ mv allusers.txt allusers.backup
[msk8@log-1 temp]$ ls
allusers.backup
[msk8@log-1 temp]$ rm allusers.backup 
[msk8@log-1 temp]$ ls
[msk8@log-1 temp]$ cd ../
[msk8@log-1 msk8]$ rmdir temp/
[msk8@log-1 msk8]$ cd temp
-bash: cd: temp: No such file or directory

Permissions

There are three levels of permissions that can be assigned to all files, programs, and directories

  • Read: open the file and copy it
  • Write: edit the file and delete it
  • Execute: Run the commands in the file or change into the directory if it is a directory

There are also three different levels of users:

  • User – you
  • Group – A collection of users that are in a group
  • Everyone – Not just the people who have accounts on the machine but if the directory is open to the public and any one.

The commands used to change owner, group, and specific permissions are:

  • chown – changes the owner
  • chgrp – changes the group
  • chmod – change read, write, and execute permissions
    • +/- r = read
    • +/- w = write
    • +/- x = execute
    • u = user level
    • g = group level
    • o = others
    • a = all
  • chmod can also use three numbers to set permissions where the value of the number represents a specific combination of rwx and their order assigns it to the different levels (u,g,o)
    • 0 = none
    • 1 = execute only
    • 2 = write only
    • 3 = write and execute only
    • 4 = read only
    • 5 = read and execute only
    • 6 = read and write only
    • 7 = read, write and execute
[msk8@log-1 msk8]$ tail allusers.txt > bottomusers.txt
[msk8@log-1 msk8]$ ls -al bottomusers.txt 
-rw------- 1 msk8 msk8 615 Jan  5 18:10 bottomusers.txt
[msk8@log-1 msk8]$ chmod u=+r-wx,g=+r-wx,o=-rwx bottomusers.txt 
[msk8@log-1 msk8]$ ls -al bottomusers.txt 
-r--r----- 1 msk8 msk8 615 Jan  5 18:10 bottomusers.txt
[msk8@log-1 msk8]$ rm bottomusers.txt 
rm: remove write-protected regular file ‘bottomusers.txt’? y
[msk8@log-1 msk8]$ ls -al bottomusers.txt
ls: cannot access bottomusers.txt: No such file or directory

Notice that since we have taken away our own write permission to the file we can not automatically delete it. The shell is asking to change the permission so it can delete. If we have write permissions that it would have delete right away.

Another way to represent permissions is to use a code where:
4=Read, 2=Write, 1=Execute. You can represent any combination of permission by simply adding them. For example if you want to give someone only read and write permission then you reperesent it with a 6. Preserve the order of User, Group, and Other, and you can represent the permissions by simply providing 3 digits.

Below I will create a simple bash script with the sole purpose of saying hello to the user. The first line of a script normally contains information about which command should be used to execute the script. However in this since we are using a bash shell, it will, by default, execute as bash.
I will also use the global variable $USER to determine the name of the user. More on variables later. Then we will give ourselves permission to execute the script.

[msk8@log-1 msk8]$ echo "echo 'Hello $USER'" > hello.sh
[msk8@log-1 msk8]$ ls -al hello.sh 
-rw------- 1 msk8 msk8 18 Jan  5 18:44 hello.sh
[msk8@log-1 msk8]$ ./hello.sh
-bash: ./hello.sh: Permission denied
[msk8@log-1 msk8]$ chmod 755 hello.sh 
[msk8@log-1 msk8]$ ls -al hello.sh 
-rwxr-xr-x 1 msk8 msk8 18 Jan  5 18:44 hello.sh
[msk8@log-1 msk8]$ ./hello.sh 
Hello msk8

History

Your shell saves all your commands and you can access them using the up and down keys.

Typing the command “history” returns all the commands you have entered and a number assigned to it.

You can run a specific one again by typing ! (also called “bang”) Followed by the job number.

!! Will perform the most recent command.

In the example code below, tail is used to get the last lines of the file allusers.txt. The > sign redirects the output to a new file called bottomusers.txt.

[msk8@log-1 msk8]$ history | grep hello
1061  echo "echo 'Hello $USER'" > hello.sh
 1062  ls -al hello.sh 
 1063  ./hello.sh
 1064  chmod 755 hello.sh 
 1065  ls -al hello.sh 
 1066  ./hello.sh 
 1071  history | grep hello

[msk8@log-1 msk8]$ !1063
./hello.sh
Hello msk8

Transferring Files

Quite we will need to either:

  • Download files from the internet to the server

An easy way to do this is to use the wget function. Simply type wget followed by the URL of the file you want to download and it will be placed in your present working directory.

wget http://some.where/some.file
  • Transfer files from our computer to the server
  • Transfer files from the server to our computer.

There are several ways to transfer files to a server. The most reliable and consistent way is to use scp. scp is a combination of the cp copy command and ssh command for connecting securely. The format looks like this:

To send a file from your current directory to the server from your computer:

scp file username@remote.host:path/to/file

To retrieve a file from the server and put it in your current directory:

scp username@remote.host:path/to/file ./file

One major advantage of using mobaxterm on windows machines is that it comes with some basic linux commands, including scp.

In the commands below, we will first log into the prince and make sure we don’t have the file we are going to download. Then we will logout of PRINCE and go back to our terminal on our personal computers. Then we will download a file to our computer using wget and transfer it to PRINCE. Then we will log into PRINCE to make sure the file arrived safely.

KATARI1009:~ manpreetkatari$ ssh msk8@prince.hpc.nyu.edu
msk8@prince.hpc.nyu.edus password: 
Last login: Fri Jan  5 11:30:47 2018

[msk8@log-0 ~]$ ls Ath*
ls: cannot access Ath*: No such file or directory

[msk8@log-0 ~]$ exit
logout
Connection to prince.hpc.nyu.edu closed.

KATARI1009:~ manpreetkatari$ wget https://learn.gencore.bio.nyu.edu/wp-content/uploads/2018/01/Ath-t10.txt.gz
--2018-01-05 19:10:09--  https://learn.gencore.bio.nyu.edu/wp-content/uploads/2018/01/Ath-t10.txt.gz
Resolving learn.gencore.bio.nyu.edu... 128.122.4.236
Connecting to learn.gencore.bio.nyu.edu|128.122.4.236|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 468245 (457K) [application/x-gzip]
Saving to: ‘Ath-t10.txt.gz’

Ath-t10.txt.gz                100%[================================================>] 457.27K  --.-KB/s    in 0.04s   

2018-01-05 19:10:10 (12.2 MB/s) - ‘Ath-t10.txt.gz’ saved [468245/468245]

KATARI1009:~ manpreetkatari$ scp Ath-t10.txt.gz msk8@prince.hpc.nyu.edu:Ath-t10.txt.gz

msk8@prince.hpc.nyu.edu's password: 
Ath-t10.txt.gz                                    100%  457KB 457.3KB/s   00:00    

KATARI1009:~ manpreetkatari$ ssh msk8@prince.hpc.nyu.edu

msk8@prince.hpc.nyu.edus password: 
Last login: Fri Jan  5 18:01:59 2018 from katari1009.bio.nyu.edu

[msk8@log-1 ~]$ ls -al Ath-t10.txt.gz 
-rw-r----- 1 msk8 msk8 468245 Jan  5 19:10 Ath-t10.txt.gz

the : at the end is very important because it tells the shell that is a server and not a file name. The : by itself puts the file in your home directory, but you can specify a specific path if you wanted to.

Controlling Jobs

The following commands and keyboard short-cuts can come in handing when you need to cancel, suspend, or start a job.

Command Action
ctrl-C Terminate current running job
ctrl-Z Suspend Jobs
bg Once a job has been suspended bg can be put in the background
fg In order to put a background job in the foreground type fg
& When executing a command and you want to put it in the background immediately put the &symbol at the end of the command.
jobs This gives a list of jobs (suspended, running, and terminating)
top

Commands for manipulating and querying files

Some more cool commands

Command Action
less/more read through the file without loading the entire file. Press spacebar to continue or q to quit.
touch create an empty file
head show the first few lines of the file
tail show the last few lines of the file
cat read through the file(s)
grep search for patterns in a file or files
cut separate file based on columns
comm/diff compare and see the difference between the files. The files have to be sorted before using either of these commands.
split splits file into smaller files based on the options
sort sort the file base on the options selected.
wc wordcount

Some useful information about linux

 Environment variables and PATH

All variables that are set in your environment can be found by using

env

The variable that is most important to us is PATH. The PATH is where the computer is looking for the commands. To see the contents of the variable type:

echo $PATH