Daily Post #5- 6/13/2024

Varun Vuppaladadiyam
Jun 13, 2024
5 min read

Howdy!

I didn't get a chance to post yesterday due to lack of time, but yesterday was a lot more of just SAS, similar as to today. My focus on SAS is primarily on data replication for clinical trials and thus I'm learning more commands that do what I want them to! A lot of today and yesterday was spent trying to memorize each command and get the syntax in my head. Memorize is the wrong word to use because I'm not memorizing the commands to memorize them, but rather to make sure that I have the logic of SAS down.

I learned the following commands in SAS, most of which are meant to replicate tables on a line-by-line basis:

data <name>;
set <data source>;
where <condition>;
output;
<condition variable>=3 */ in this case I chose 3 as the hypothetical condition has 2 sets of data and I want to create a total/*
output;
keep <variables you need to make table>
run;

*/ I used the output command to create 3 groups of data/*

proc sql;
%put (&N1-&N3);

*/ I want to calculate percentages later down the road, so I store the count of each group into these macros/*

proc sql
create table <table name> as
select "<Name>" as col1, count(*) as cont, <variable of interest> from <datasource>;
where <data conditions> and group by <variable of interest>;
quit;

*/Here I used proc sql to generate a table. Proc sql is cool in that it allows me to give the table a name as well as allowing me to give variables for proc sql to use to make a table. I also used the group by variable to group the table in a way that would allow me to see the breakdown for each group. The variable of interest is the same as the conditional variable of the data step./*

data line1sum;
set <table name>;
length <placeholder for variable of interest> $ 10;
if <variable of interest>=1 then do;
pct= strip(cont)||'('||strip(put(cont/&N1))||')';
<placeholder for variable of interest> =<group name1>;
end;
else if <variable of interest>=2 then do;
pct= strip(cont)||'('||strip(put(cont/&N2))||')';
<placeholder for variable of interest> =<group name2>;
end;
else if <variable of interest>=3 then do;
pct= strip(cont)||'('||strip(put(cont/&N3))||')';
<placeholder for variable of interest> =<group name3>;
end;
ord=1;
drop cont;
run;

*/ Here I used the proc sql step as a data source as it already gave me a table to input values in, and thus was able to create a data step where I was able to create percentages for each of the variables of interest. I used the strip command as well as concatenation steps in order to make sure that the table would show these values for percentages. This is to make a single line where the first column gives us the category name with following breakdowns of each subgroup/*

proc transpose data=line1sum out=line1;
id <variable of interest>;
var pct;
by col1;
run;

*/ Proc transpose is used to make sure that the data gives us a breakdown of each group in a graph, where the variable of interest are columns instead of rows and we can see the breakdown along with the percetanges. We did it by col1 in order to make sure that each data point gets transposed/*


proc sort data=<datasource>;
group by <variable of interest>

*/ done to allow for proc freq/*

proc freq data=<datasource>;
tables <condition we want breakdown for>/out=procfreq;
by <variable of interest>;
where <more specific breakdown of condition> and <condition we want breakdown for> ne ' ';
run;
*/ We make the table for the proc freq and we do it for the condition we want the breakdown for by table and we make it a new table called procfreq. This proc freq will give us a breakdown grouped by the variable of interest and will get rid of all other null values as well as giving us a very specific breakdown by the where condition/*

data line2sum;
length <placeholder for variable of interest> $ 10;
set proqfreq;
if <variable of interest>=1 then do;
pct= strip(cont)||'('||strip(put(cont/&N1))||')';
<placeholder for variable of interest> =<group name1>;
end;
else if <variable of interest>=2 then do;
pct= strip(cont)||'('||strip(put(cont/&N2))||')';
<placeholder for variable of interest> =<group name2>;
end;
else if <variable of interest>=3 then do;
pct= strip(cont)||'('||strip(put(cont/&N3))||')';
<placeholder for variable of interest> =<group name3>;
end;
ord=2;
drop cont;
*/ We do a similar step like when we first used the data step as we want to give percentages to each column. Because we're using the same data as proq freq, this will give us the break down in terms of the variable of interest. We're still going to have to transpose this as this gives us an output that shows us each individual data point for the variable of interest, when we want the variable of interest as a column that shows us the total number of condintion./*

proc transpose data=line2sum out=line2;
id <variable of interest>;
var pct;
by <condition we want breakdown for>;
run;

*/ Since we want a break down for the condition that we're interested in, we say by <condition we want breakdown for>, which is the key difference for this code vs the prior code. We use line2sum as our proc transpose as we want this to be our line 2 as well./*

I'm spending a lot of time this week making sure that my SAS skills are on point. This weekend will have to be dedicated to more Python and ML to make sure that they don't get diminished. I'm going to continue learning today, but I'll write it down tomorrow as I'll be dead tired when I get home tonight LOL.

I do need to actually get more work done on research that I want to do however, I'm spending too much time on skills rather than using the skills I've learned. Another interesting idea that came to my mind was the correlation between development and the number of gas stations. The number of gas stations is correlated with the number of cars, and the number of cars with population, but if a city has more cars, does it mean that it is developed? Doesn't a higher population imply a high level of development as most people would want to live in a more developed city? In Texan cities, it may not be the case due to the urban sprawl and the necessity of cars. It may be more than Texan cities as it's not just Texas that has urban sprawl. We also see more affluent people starting to drive electric cars so this may also affect the data, but we can't be sure until we see the data. Finding the data will be the difficult part. This seems to be a fun side project. I'll have to spend more time on it this weekend.

Daily Post #5- 6/13/2024

Recent Posts

Comments