Stata
Stata is data analysis and statistical software.[1]
Installed Versions
On Picotte
Stata 17 is installed. Stata/MP 48-core is installed.
- Stata 48-core: limited to 10 seats
NOTE Please update your job scripts to remove the license request
“#SBATCH --license=stata:1
” if you do not need it. Current job scripts
with that line may fail, as the license name has been changed to
“stata48
”.
Load the appropriate modulefile for the version you intend to use:
stata/mp48/17
Documentation
PDF documentation is available on Picotte. They are in the directory
${STATADIR}/docs/
available once the modulefile is loaded.
Documentation is also available online: https://www.stata.com/features/documentation/
Personal Setup
Your own ADO files go into:[2]
~/ado/personal/
Running
Command Line Version
To run the command line version:
[juser@picotte001 ~]$ module load stata/mp48/17
[juser@picotte001 ~]$ stata-mp
___ ____ ____ ____ ____ ®
/__ / ____/ / ____/ 17.0
___/ / /___/ / /___/ MP—Parallel Edition
Statistics and Data Science Copyright 1985-2021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: 10-user 48-core network, expiring 24 Feb 2024
Serial number: 501709314736
Licensed to: University Research Computing Facility
Drexel University
Notes:
1. Unicode is supported; see help unicode_advice.
2. More than 2 billion observations are allowed; see help obs_advice.
3. Maximum number of variables is set to 5,000; see help set_maxvar.
.
Jupyter Notebook
See: Stata in Jupyter
Graphical User Interface (GUI) Version
N.B. Running Stata in a Jupyter Notebook (above) will be a more responsive user experience.
To run the GUI version, you must have an X11 server installed on your computer. See:
The command to use is xstata or xstata-mp for the multithreaded version.
Note that if you do this, you will be running Stata on the login node, which is a shared resource where multiple people may be logged in and doing work simultaneously.
Picotte
To run the both the terminal and GUI versions, see: Running GUI Applications on Compute Nodes
The difference is that for the terminal version, you run stata-mp, while for the GUI version, you run xstata-mp
Submitting Jobs on Picotte
For long-running computations, you will want to write a job script to be submitted as a job on the cluster. Since the limit on the number of CPUs which may be used simultaneously by any one group is 512, it will be very easy to exhaust all Stata licenses with job submissions.
Please see more detail in Writing Slurm Job Scripts
Requesting License
Only stata/mp48
jobs require a license. Each job should request one
license. The license on Picotte is limited to no more than 10
simultaneous uses.
#SBATCH --licenses=stata48:1
N.B. the license name has been changed to ”stata48
”.
Stata/MP
Stata/MP, which runs multithreaded, is
also available. It is provided by the command stata-mp
. There are two
editions of Stata/MP on Picotte: one licensed for 4 cores (unlimited
number of seats), and one licensed for 48 cores (limit of ten seats).
N.B. more cores (threads) does not guarantee better performance. It can frequently be the reverse.
NOTE: you must do "set processors NN
" in your .do
file to be the
exact number of slots requested by the job. The number of slots (each
slot is one processor core) is requested by the lines:
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
#SBATCH --license=stata48:1
module load stata/mp48/17
In this example, the number of CPU cores is 48.
You may also read the environment variable SLURM_CPUS_PER_TASK
to use
in the "set processors NN
" command in your Stata .do
file:
local p : env SLURM_CPUS_PER_TASK
set processors $p
SLURM_CPUS_PER_TASK
is set by Slurm, the job scheduler, to be the
value requested by "--cpus-per-task".
NOTE ON PERFORMANCE: More does not necessarily mean faster. Some functions/routines may be parallelizable, others may not be. You will need to benchmark your specific computation to find the optimal number of CPU cores to use in the computation.
Example Job for Picotte (Slurm)
This is the Stata script to be run -- named testing.do
:
// test computation - testing.do
clear*
set rmsg on
set obs 100000
local p : env SLURM_CPUS_PER_TASK
set processors $p
forval n = 1/5 {
g i`n' = runiform()
}
g dv = rbinomial(1,.3)
memory
qui logit dv i*
qui xtmixed dv i*
*with bootstrap:
qui bs, reps(2000): logit dv i*
This is the job script -- named teststata.sh
:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=128G
#SBATCH --account=urcfadmprj
#SBATCH --time=4:00:00
#SBATCH --license=stata48:1
module load stata/mp48/17
# set the Stata temporary directory to the job-specific temporary directory
# this directory will be automatically deleted at the end of the job
export STATATMP=$TMP
stata-mp -b do testing.do
To submit the job:
[juser@picotte001]$ sbatch teststata.sh
NB you may see a warning/error message in the Slurm output file; this can be safely ignored.
stata-mp: /lib64/libtinfo.so.5: no version information available (required by stata-mp)
Outputs
Stata, by default, produces a log file named after the .do
file. So,
running the Stata DO script something.do
produces the log
something.log
If the same DO script is run multiple times, later runs will overwrite the log from earlier runs.
See Also
- Stata Support and Online Resources
- If you are interested in converting from Stata to R: http://dss.princeton.edu/training/ in particular this PDF http://dss.princeton.edu/training/RStata.pdf
- R vs. Stata benchmark