Stata Tips #15 – Publication ready graphics
More data is available than any point in history and often a simple graph can go a long way in presenting complex relationships between data elements. Stata offers an impressive set of options to create graphs. In the following post, we look at three features in graphics. The first one is the new transparency feature in Stata 15. The second and third are user created commands.
Stata 15 introduces a new feature that allows the user to make Stata graph’s elements to be transparent rather than just opaque. Using the city_data file let us draw a simple scatter plot that shows the relationship between the share of jail being full and the adult unemployment rate across 75 cities (Note: the data is fictitious and was created for illustrative purposes only).
twoway (scatter adult_unemployment_rate jail_use)
The above graph shows the dot as dark. We now add the option ,mcolor(%60) to show the new transparency feature.
twoway (scatter adult_unemployment_rate jail_use, mcolor(%60))
We can change the colors and how transparent are the dots. Let us try: ,mcolor(%20) to show the new transparency feature.
twoway (scatter adult_unemployment_rate jail_use, mcolor(%20))
Change the color to green:
twoway (scatter adult_unemployment_rate jail_use, mcolor(green%60))
We can add another series and change the transparency levels for each series:
twoway (scatter adult_unemployment_rate jail_use, mcolor(green%60)) (scatter youth_unemployment_rate jail_use, mcolor(red%10))
First of all we need to install this new scheme into Stata; type the following:
ssc install blindschemes
Now let us run the same command and add the option
twoway (scatter adult_unemployment_rate jail_use, mcolor(green%60)) (scatter youth_unemployment_rate jail_use, mcolor(red%10)), scheme(plottig)
This gives us a completely different look using the same command but under a different scheme.
This command presents coefficients from regressions in a graphic rather than as numbers in a table. This type of coefficient presentation is gaining interest among researchers and is often easier to show during presentations.
First of all we need to install the command into Stata: type the following:
ssc install coefplot
Open the data file
coefplot. The data set includes demographics and an outcome variable that is whether a person is working full time or part time. We want to look at the relationship between level of street violence (crime) to the likelihood of working. We estimate the following two equations (after defining global macros to capture the names of the variables):
global demographics age male first_children son_daughter_to_head
global household male_household age_head_household father_alive mother_alive edu_father_primary edu_father_secondary edu_mother_primary edu_mother_secondary nb_rooms_for_sleeping nb_hh_members urban
global violence violence_level
probit working_full violence_level $demographics $household i.state_id
After the first regression, we store the estimates in a variable called
estimates store first
We now run the same regression but with working_part as the dependent variable and we store the estimates in a variable called
probit working_part violence_level $demographics $household i.state_id
estimates store second
Then we can plot the coefficients from both regressions in one graph using the below code:
coefplot first, bylabel(Working Full Time) || second, bylabel(Working Part Time) ||, keep(age male edu_father_primary edu_mother_primary edu_father_secondary edu_mother_secondary violence_level) coeflabels(age = "Age" male = "Male" edu_father_primary = "Father Education Primary" edu_father_secondary = "Father Education Secondary" edu_mother_primary = "Mother Education Primary" edu_mother_secondary = "Mother Education Secondary" violence_level = "Level of Violence") legend(off) mcolor(black*.7) mfcolor(white) lcolor(black*.7) xline(0) graphregion(fcolor(white)) plotregion(fcolor(gray*.1)) ciopts(lcolor(black*.7) lwidth(*3)) msize(medium) ci(95) aspect(2) byopts(row(1))
The white dot reflects the coefficient and the black bar is the 95% confidence interval so one could easily see that the coefficient on level of violence is actually around 0.4 and statistically significant for working full time while it is negative and also statistically significant under working part time.
keep() allows us to choose which covariates we want to show on the graph; the
coeflabels() allows us to label these covariates; the other options help us design the graph so feel free to experiment with these to build your own design.