This is an attempt to collect all publicly available data sets that pertain to Public Health.

So...Where are all the data sets that are publicly available (either free or with a cost)? For now, I am not looking for data that have been collected by an entity but are not shared or easily shareable. If you have one that you use or just know of one (or multiple), please complete the following form by clicking on the following: Public Health Datasets or http://bit.ly/JpHkhJ.

Check back in a couple of weeks as I share the information.
In case you are wondering what qualifies as a 'Public Health Dataset", I am using the definitions available on Wikipedia. They are:
  • Public health is "the science and art of preventing disease, prolonging life and promoting health through the organized efforts and informed choices of society, organizations, public and private, communities and individuals" (1920, C.E.A. Winslow).
  • A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. It lists values for each of the variables, such as height and weight of an object. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows.

 
 

Scroll over the map to see the exact percentage of uninsured children. The fewest uninsured is dark green to the most uninsured, which is dark red.

These data were generated from statehealthfacts.org on 16 October 2011.
 
 
The political season always presents a wonderful opportunity to hear and view a lot of statistics. The media (e.g. FactCheck.org) and people in general are getting better about determining whether presented information by politicians is accurate. This is only part of the equation, however, as accurate information can still be presented in misleading manners. For example, in the latest Republican primary candidate debate on October 11, 2011, Mitt Romney touted the health care in Massachusetts to Perry stating, "We have less than 1 percent of our kids who are uninsured. You have a million kids."

The problem with this statement is he is comparing percentages with count data. We need to know what the total number of kids in Texas is to truly understand the comparison. Hence, the question should actually be: What proportion (or percentage) of kids in Massachusetts and Texas are insured? (Note, that the definition of 'kid' also needs to be defined.)

 
 
Recently, a researcher made the comment that dealing with data is “the easy part” with regards to research. She conducts laboratory experiments that are time-consuming and indeed complex, yet I was taken aback by the thought of another person thinking that dealing with data is ‘easy’.

While I wholly disagreed with the researcher’s statement, I needed to clarify what about dealing with data is difficult as she is not the only one who thinks like this.  The first step in explaining the challenges with data was to briefly outline the steps in the data process. What I came up with (as have others – see “Notable sites” in Part II) was dealing with data involves: Collection, Cleaning, (Integration), Analysis, Interpretation, and Dissemination. In this essay, I discuss the first three.

 
 
Need: Profile photos for all the VIVO faculty members at my institution, sized to be easily imported into VIVO.

Background:  This post will be part of the VIVO Implementation Guide, which is in development. For more information about VIVO see vivoweb.org.

Issue: Most departments either did not have an aggregated file of photos they could send me, or the ones they sent were the wrong size. I also wanted a way to do this using readily available web-ware so little to no programming experience would be necessary.

Solution:
1) Use a web-mining tool to aggregate the data.
The easiest I found was the Firefox plugin ‘Outwit Hub’. There is a free version, but I bought the full version (US $35 in 3/2011) so I could scrape more than 100 sources at a time. It was well worth the price.

2) Use a photo re-sizing application to modify the photos to the required size.
The one I liked the best was XNView MP. At the time, I downloaded the free, unstable, beta version. Even with it crashing some of the time, I was able to have 800 photos re-sized within an afternoon.


 
 
I have struggled with writing most of my life. For me, writing is grueling…time-consuming… aggravating…unnatural. Yet, the minute one commits to a life in academia, one might as well have applied for a degree in writing – no matter what the original major. With a Masters in Biostatistics, a PhD in Epidemiology, and now a career in medical informatics, I have to write to survive. Grants. Manuscripts. Reports. Reviews. E-mails. Every week, every day, almost every hour of the working day. While many accomplishments are achieved using technical people who are in high demand, we generally don’t like to write, so we have less experience writing in addition to the fact that we have very little time to write. It is a vicious cycle of intelligent illiteracy.

In an effort to either ease the burden you may experience or simply share my pain, I am sharing some lessons I learned in the process of writing an abstract for a recent conference. Quite a few weeks after submitting an abstract for a scientific conference I attend every year, I was crushed to have a ‘presentation’ demoted to a poster. (More on how that is not such a bad thing in another essay.) At that point I had a few options: 1) not present anything; 2) present the poster, and leave the abstract as it was; or, 3) present a poster, but revise the abstract. After a bit of reflection, I chose option #3. This work would be indexed and available on the web, so I wanted my work to be well represented. Below are the things I could have done the first time, but I at least did the second time around.