Blog

CHAPTER FOURTEEN: PRODUCTIVITY

One of the most common currencies for scientists is publications. There are, of course, many other measures like patents, grants or other funding, Investigational New Drug (IND), (text)books, code repository forks/downloads, and many more. In general though, every measure of productivity breaks down into some condensed summarization of a ton of work, crafted for consumption by others. 

Across almost all fields, not just science, people use resumes or curriculum vitae (CV, which is just an extra long resume basically) in order to quickly and clearly communicate career productivity. Of course, resumes and CVs are imperfect, as they will never fully capture the nuance of someone’s life and experience, but that also highlights how important written communication is. There’s a whole rant here about scientific productivity being measured most commonly by written communication, while never really getting trained well in written communication as STEM majors, which is in part what launched my whole aspirational goal this month of writing 50k words. (Aside, to be clear, 50k words is not happening, as it’s 7PM on November 30 here and I have at the moment just over 15k words, so I don’t think it’s possible to crank out another 35k words in five hours. I’ll probably write tomorrow a conclusions or summary post that’ll continue from this one as a reflective retrospective of how I think this inaugural #NovemberWritingChallenge went. Spoiler: I’m of course disappointed that I didn’t even come close to 50k, but also I’ve learned a lot about what holds me back from writing.) 

I’ve written previously about the COMMUNICATION aspect of publications and other avenues for dissemination of scientific work, but since the last chapter on MESS I’ve been thinking about it from the standpoint of repeatability and reproducibility in scientific literature. First, to clarify, repeatability typically refers to the same person repeating the same experiment with the same system and getting the same result, while reproducibility refers to a different person attempting the same experiment with the same or similar system and getting the same conclusion. Anything published should really be repeated, because if you can’t get the same result you’re claiming in your publication, then you probably shouldn’t publish it. But reproducibility can be harder. Borrowing from statistical descriptions of missingness, in my mind, there’s irreproducibility “at random” and irreproducibility “not at random”. The former is where biology is just hard and there’s some hidden aspect of the experimental system that is unknown, and this is where the scientist is not really at fault for irreproducibility. Irreproducibility “not at random” is where the scientist just did a terrible job of describing the methods, the system, or the analysis. I’m assuming laziness here and not straight malicious lack of detail, although there are of course examples of malicious intent by manipulated data or straight fake analyses.

Irreproducible “not at random” is at least in part bad methods, speaking to the specific section of a scientific manuscript. Methods sections are the second easiest place for me to start writing a paper, aside from the results or figures, because I’m just describing what I did. Usually my methods are pretty generic and widely used in the field so they don’t need much detail, but sometimes there’s specific twists that I’ve added. It’s not unlike cooking and having recipes. Most people have some idea of what goes into a chocolate chip cookie recipe, but some people might have a specific twist based on their personal taste preferences, like using brown butter instead of regular, or based on necessary accommodations, like adjustments to account for baking at high elevations. So the equivalent is that maybe the scientist of the irreproducible work is just not realizing that their method works great for them because their experiment is being done in Denver, but a scientist at sea-level in Boston needs a different recipe, i.e. protocol or method.

Not to get back into the whole artificial intelligence (AI) debate, but maybe AI would be helpful for reproducibility of analyses. I’d be shocked if a lot of the papers coming out now aren’t using analyses that were written, at least in part, by AI like ChatGPT, Claude, etc. If people are already relying on AI to write their data analyses (and therefore guide their conclusions), then it’s not a huge leap to use the same AI to take things one step further and capture the whole “chat” and publish that as a supplemental method. At the bare minimum, people should be capturing the code and publishing those scripts or notebooks alongside their papers for repeatability, but I know a lot of people put terrible code out there that can’t be rerun by anybody else due to hardcoded paths or missing dependencies, and many more people just never even make their figure-producing code available.

Maybe AI could go one step further though? If we capture the protocols, alongside the data processing, and the result-producing post-processing analyses, that should be the most ideal reproducibility scenario. I don’t know exactly what that might look like in practice, but that’s something that would massively help me in implementing other people’s methods in my own lab. Upload someone else’s paper, and have the AI generate a shopping list based on the methods so that I can get all the supplies I need, and then spit out the protocols based again on the methods and a bit of the results, maybe. There’s probably also something like that out there, if only for cooking.

Anyway, all of this reproducibility ramble got me away from the initial thought, which was productivity as measured by writing. In management-speak, what goes into the resume or CV are the OKRs (Objectives and Key Results) and what helps get you to the OKR is the KPIs (Key Performance Indicators). Setting goals, or even “resolutions” as we’re now coming up on the end of 2025, is usually at the OKR level, but without some KPI to help get you there, you’re probably never going to make your goal or resolution. When you set out to run a marathon, you don’t usually just walk up to the start line and then bang out 26 miles. Usually you decide that you want to run the marathon (OKR) and then break it down in a training plan with gradually increasing mileage each week (KPI). As another example, this whole writing challenge this month to write 50k words (OKR) came with some clear daily mini-goals like shooting for 2k words/day (KPI). 

One place I think people struggle, both with methods for papers and just in general productivity measurement, is figuring out whether the KPI is really what the audience needs to know, or if the audience really just cares about the OKR. For the methods section, you really need to be specific and detailed, but for reproducibility, it’s enough to be shooting for the OKR – in fact, it’s probably even better for the scientific community and furthering human knowledge if we can reproduce the idea or conclusion by orthogonal means, rather than directly reproducing the exact experimental conditions which might be correlated but not causative to the conclusion being drawn by the original scientist.

Similarly with personal productivity, it’s easy to punch out a bunch of KPIs and make progress in ticking off to-do list boxes, but if you’re not keeping the OKR in mind, you may be doing a bunch of busy work without making meaningful progress on the real goal. “Not everything that can be counted counts, and not everything that counts can be counted”, as William Bruce Cameron is quoted as saying. A few weeks ago, I was talking about this with some other women at a professional event, where we were discussing the challenges of accommodating alternative paths. Specifically, the conversation turned to the subject of childcare stipends for conferences, and a couple young mothers emphatically supported the idea. To be clear, I love the concept and stipends to help people afford traveling to professional events and career advancement opportunities. That said, a couple other women, myself included, cautioned that bringing children to the conference may prevent you from getting the full value of the conference, which isn’t usually in the formal programming but rather the informal networking that happens after the official programming ends and tends to happen in the evenings. For me this is a personal observation, as when I’ve tried to bring my child to conferences, it just ends up with me doing a pretty terrible job both professionally and personally, and nobody got my full attention because I couldn’t engage fully in either setting. While, yes, I hit the KPI of “attend conference, give talk” and “conduct bedtime routine”, I didn’t really make progress on the OKRs of “advance career” or “be a present, available parent”. That said, I also recognize that I’m lucky to have the option of leaving my child with my partner when I go to conferences now, so that while I miss spending time with my family when I travel to events, it’s a choice that not everyone has the luxury of making.

I certainly wish there was a better system, both speaking specifically of conference networking and also more broadly of productivity measurements, but until scientific society at large changes, I think we’re stuck with the aforementioned productivity metrics and figuring out tools to manage it or otherwise cope with it.

CHAPTER THIRTEEN: MESS

Being a military brat growing up, I moved a lot. About every three years, we would move halfway across the country (or, when I was very young, halfway across the world) so it’s not too surprising that it was only recently that I ever spent more than a couple of years in one spot.

There are some advantages, possessions-wise, in moving frequently. Moving sucks, logistically, so it incentivizes minimalism. The less stuff you have, the less you need to pack up. It also encourages you to sort through what you have, because if there’s some pile of boxes in the corner that never even get unpacked, then it’s probably likely there’s nothing you need in those boxes. 

A few years ago, specifically 2019, the popularity of Tidying Up with Marie Kondo hit right at the same time as I was moving across the country (again). In addition to a physical, geographical transition, I was also transitioning emotionally and professionally, having also just gotten married and defending my PhD roughly 48 hours apart. All this to say that it was a perfect moment to reflect on all my physical stuff as I was packing it into boxes, and think about what was serving me by either literally being frequently used or by being emotionally valuable. Everything else was clutter that was taking up space in my small UHaul and should be donated or disposed of.

Having these “tidying” moments is useful to clean up a literal house and also a metaphorical house. In science and tech, there’s the idea of “tech debt”, where solutions do technically work for some problem, but they’re rushed, suboptimal, and eventually cause problems themselves because they don’t scale or require too much frequent fixing. So, every so often, it’s helpful to pause, inventory what’s going on, and spend some time resolving that tech debt so that there’s less problems in the future. Tidy up. Declutter. 

I don’t think anybody ever has a perfectly minimal house, or a perfectly debtless tech stack. Those only exist in magazines and are almost entirely staged to look that way. The incentives to “move fast and break things” is too high in tech work to always take the slow, thorough route. But even if you have to take on some tech debt, it’s always nice to keep that debt manageable so that you don’t end up with the tech equivalent of a hoarding situation, or a critical process that ends up duct-tape-and-chewing-gum system that requires Windows 7 or something to remain functional.

Tech debt (and clutter) can also indicate a lack of focus or thoughtfulness. The quick and easy solution is not always the right one, “measure twice, cut once” or “buy it for life” kind of a situation. It’s the same with experimental design in the wet lab. Without thoughtful experimental design, like including all the proper negative and positive controls required to adequately interpret the results, you end up with a pile of uninterpretable data that needs to be repeated anyway. The simpler and more straightforward the experimental designs, the more likely you are to be able to draw a conclusion from it.

Too frequently, though, scientists are encouraged to move quickly. Usually too quickly to really be doing the science as rigorously as they should, or set up the processes required to sufficiently repeat an experiment. Some of this might come from misaligned incentives, as the publish-or-perish mentality in academia demands high output and productivity without the timelines that allow truly innovative ideas to come to fruition. Some of it is bad processes, as the manuscript-write-up process is usually only built around “successful” experiments, which the replication crisis has demonstrated as a not-infrequently lucky one-off observation rather than a truly reproducible phenomenon.

The lack of reproducibility, “reproducibility crisis”, in science is another form of tech debt, in a way. There’s a lot of “clutter” in the scientific literature, things that have been disproven or haven’t replicated and should probably be reduced or eliminated. Most scientists go through the rite-of-passage that is composing a “literature review”, basically reading everything they can find about a topic and organizing it into a “too long; don’t read” compilation, usually with some minor commentary or perspective to frame the topic.

While a lot of “mess” makes it into the scientific community, there’s also a lot of really good stuff that never makes it. It fades away because the grad student or postdoc moved on, or there was one last experiment to do that never got done, or just something else exciting came along and became the new shiny thing to work on. It’s the complete opposite side of the spectrum, a lack of work that should be out there but instead is cluttering someone’s abandoned to-do list.

It’s a shame that there’s probably a lot of good work out there that never got the attention it deserved, and it’s equally a shame that there’s a lot of bad work out there making a mess of the literature. 

CHAPTER TWELVE: COMMUNICATION

A brief break from the deep subject matter expertise posts, because it has me thinking broadly about scientific communication. Specifically, that formal training (i.e. undergrad and grad school) never prepared me for real scientific communication.

The focus in school was always communicating science to non-scientists, usually children from middle school through high school. While getting kids interested and excited about STEM subjects, this focus on middle/high school scientific communication really did me a disservice in my professional life and probably even my personal life.

There are so many more facets to scientific communication that really don’t get addressed sufficiently in higher education circles. I’m thinking about how I have, professionally, had to figure out communicating very technical, jargon-heavy concepts to a variety of really smart people who have expertise outside of my hyper-focused niche. The way I might approach communicating what I do (mass spectrometry, proteomics, transcription factors, etc) completely depends on the audience. A venture capital investor, for example, is thinking about things from a financial perspective, while a pharmaceutical scientist is thinking about this from a drug discovery perspective, and an oncologist is thinking about the outcome and impact on patients in a clinical trial. Nobody is “wrong”, nobody is smarter than anybody else, everyone’s just thinking about the same thing from a different perspective and with a different lens. 

It follows, then, that communicating the same topic needs to be framed specifically for different audiences. It doesn’t mean that the core concepts are changing, just that the language needs to be different for the most effective communication. Maybe language is an interesting parallel, where translating the same message into different languages shouldn’t change the core message, but using the right language for the right audience is going to make communication much easier than forcing everyone to do the translations themselves, or even worse just zone out and not even listen to the message at all.

None of my undergraduate or graduate school outreach opportunities touched on this concept of science communication to adults, really. As far as I can recall, it always focused on making fun, hands-on “labs” or demonstrations for kids to learn scientific concepts. Then I got into the “real world” and suddenly cute little demonstrations aren’t really working anymore.

An obvious example of where scientific/STEM communication can go really right or insanely wrong is with policy. Policy makers might consult scientists and doctors and other professionals, piece together all of the best expert advice to write into laws or regulations or recommendations, but without effective communication, policies are relying solely on people following guidance based on an appeal to authority. Sometimes that works, but a lot of times it doesn’t. In my own work, appeal to authority has very rarely worked out. I don’t have much authority outside of my hyper-niche specialty, and so the communication (good or bad) is weighed much more heavily.

I think there’s a lot we could learn about communication, as a scientific community, from novelists and screen/scriptwriters. Crafting a story can help hook an audience into a message. This is another place where grad school trained me, but maybe trained me specifically to give scientific talks to scientific audiences; I’ve had to relearn how to build a “story” based on the sort of “plot” or pacing that communicates the message best.  I think there’s parallels between classic literature tropes (e.g. hero’s journey, tragedy, comedy) that could get translated really well through the lens of telling a scientific story. There’s some examples of nonfiction biographies or histories that have done this for biotech stories, like the books Living Medicine and Billion Dollar Molecule, that retell the history of bone marrow transplants and the pharmaceutical company Vertex’s founding respectively, but they do it with a framing that helps tell the historical story and scientific journey in a way that is nice to read.

I’m sure with both of those books that the exact history isn’t perfectly captured, and in part can never be because the way the stories are told involves so many peoples’ specific memories and emotions and motivations, but the general approach is something I really admire from a communication standpoint.

I think the books work so well, in part, because the reader can pattern match the general story arc to other novels. There’s some backstory setting up the scene and the characters, there’s some tension or suspense that puts the main characters through a challenge, and then there’s a resolution by the end of the book. There’s some subplots along the way, maybe some romance or comedy. 

Lack of storytelling is why a lot of scientific communication falls so flat. I’ve sat through a lot of scientific presentations that are just a linear chronology of all the experiments that the presenter has ever done. The right story, though, is almost never the chronological story. Although there’s some contexts where the chronological story is motivational to the audience, I think most audiences are thinking ”Why should I care?” so you need to directly say out loud why they should bother listening and paying attention to you. For startup pitch decks, usually the first slide is either directly stating the problem that the company proposes solving, or it’s the financial opportunity (market size); both of which directly tell the audience why they should care because it’s a problem that they themselves can recognize or it’s an opportunity to make money. For a scientific presentation, usually an element of teaching goes a long way, so that the audience cares because you’ve taught them something new. (Put another way, making the audience feel smart/smarter is a good motivator for a scientific audience, who is probably always looking to learn more.)

In scientific manuscripts for peer reviewed journals, there’s definitely a certain pattern that I’ve come to expect from papers. For a basic four-figure paper, the first figure is some kind of method or overall experimental schematic. The second figure is a high-level visualization of the data, like a heatmap or a dimension reduction like principal component analysis (PCA), t-SNE, UMAP. The third figure is a deep dive into some slice of that big dataset, just visualized differently. And then the final figure is some orthogonal experiment to prove figure three correct, and/or a schematic of some biological mechanism that the data suggests. Pattern matching that template helps get through papers pretty quickly, because I can just flip to the figures and usually they follow some general flow like that.

Having some portion of the communication being predictable helps get the message across. If the message itself is unexpected or difficult, then having the medium be predictable or the presentation be predictable can help, I think. Predictability isn’t a bad thing. There’s something comfortable about knowing what to expect, and when it changes abruptly, it can be jarring. I’m thinking about things like when your favorite band has a particular style of music that they produce, but then there’s that one weird random song that doesn’t fit the vibe and sticks out badly. (Of course, some people are amazing across genres, and there’s some scientists like that, too, who can easily hold their own across multiple fields.)

Something I use almost always in my scientific presentations is a “three-act” structure. My talks almost always start with some brief introduction to set the “scene”, maybe 10-15% of the total talk. Then, I set up three main, take-home messages for the audience, and each of the three is about 25-30% of the talk and somewhat builds on each other. Finally, with the remaining 10-15% of the talk, I have some “cliff-hanger” future work, but not whatever the next obvious logical step would be based on what I’ve said, but some more distant future vision. It’s not something that I happen to do, it’s something that I pointedly intentionally do whenever I sit down to organize a talk. I usually start by deciding on the three main messages, set each of those up, then do a little scene-setting/exposition at the beginning that gives just enough context for the three messages, and then a little bit of forward-thinking “cliff-hanger” at the end. I don’t claim to be the best presenter or anything, but I’ve been invited to speak quite a bit so I figure something must be resonating.

None of this is meant to be an immediate solution to any scientific communication struggles, and again to be clear I don’t mean to imply that I’m a significantly better communicator than anybody else. In part, this is because I don’t think we scientists get enough training on communication, and in part because it’s just hard anyway, even if we did get trained. I’m inspired by writers, though, because I think if we structured scientific communication to lean on common literature, like plot devices and story structure, we’d probably capture a wider audience and have more buy-in and support from policy makers, funding agencies, and even the general public.

CHAPTER ELEVEN: INTEGRATION

This chapter will be a continuation of the last, “PROTEINS”, staying along the central dogma measurement theme. Previously I focused primarily on how to measure proteins, but don’t get me wrong, measuring the other molecules (DNA and RNA) are also valuable and important. No one molecule is going to hold all the information to crack biology, and that sentiment is what’s generally behind the hype and excitement of using “AI” for biology.

A quick aside, I don’t love the “artificial intelligence” term. Back in my day (ha) it was “machine learning” but for some reason “AI” has replaced “ML”. And for that matter, when “AI” is used lately, it’s almost always synonymous with “large language model”/LLM i.e. ChatGPT. Which is one type of AI (ML) but while impressive, I don’t think LLMs are really going to be the type of ML to crack biology.

The hype or promise of making AI really useful for biology is to improve drug discovery, or personalized medicine, or diagnosing disease, or even just finding new fundamental knowledge and connections we haven’t made previously. That last part is where my doubts come in. We (speaking for the greater scientific community) haven’t really had much success integrating multiple molecule measurements together. There’s a lot of work towards multi-omics, that is, collecting multiple ‘omic measurements on the same or similar samples, like doing both DNA sequencing and RNA sequencing and even proteomics all on the same sample.

But I haven’t seen much multi-omics analyses that are truly integrating the data together. Most analyses I see analyze each individual data type or measurement, arrive at an individual data type’s conclusion, and then move onto the next data type. It’s more sequential ‘omics than multi-omics, in my opinion.

It’s interesting, too, because there’s significant efforts to use one data type in order to predict another data type. For example, using RNA sequencing data (transcriptomics) to predict which proteins will be present and at what abundance. These predictions are imperfect at best, and just straight wrong at worst. Obviously, if they worked better, it would be great to use a cheaper or easier measurement to predict a more expensive or laborious measurement.

Because we haven’t had much success in “simple” computational approaches to integrate across multiple measurement systems, I’m a bit skeptical that we’re going to be able to put together an AI that really cracks being able to predict biology in general. We just don’t have a good enough understanding of the data yet, so I’m not sure how you can train a computer to make sense of something that we don’t yet understand ourselves.

The lack of “training data” is the big hurdle, I think. LLMs like ChatGPT were enabled, in part, because there’s such a huge volume of text to train on, but there’s nowhere near the same volume of data for biology. Further, of the biological data we do have, the majority of it is poorly annotated, if at all. In other words, there’s a data file somewhere, but we don’t really know where it came from, what it’s supposed to be measuring, and whether the quality is good or not. 

(Aside, this is for sure true of proteomics data measured by mass spectrometry, although perhaps some other fields are better about data annotation and labeling. The “metadata” of files, including even some simple things like what organism was measured, isn’t always provided with the files. There’s efforts to fix this, both by prospectively requiring metadata files at the time of data submission for academic, peer-reviewed publications, and also by retroactively going back through archives of the most popular or high quality data and backfilling metadata for them. Overall, it’s going to be tough for mass spectrometry proteomics either way, because even if we get the data files annotated with correct metadata, there’s a second problem of needing to have those raw data files be processed in some way to make them more useful for machine learning, at least ML with the intent to infer or predict biology.)

I might be totally wrong though. There’s the classic essay, “The Bitter Lesson”, by Rich Sutton, which suggests that we don’t need more data, or better data, or be able to integrate data together; instead, that essay suggests that “the only thing that matters in the long run is the leveraging of computation”, and therefore we don’t need to worry about current limitations or challenges with multi-omics or amount of data. With the Bitter Lesson philosophy, instead we should feed the computers what we have so far, and see if they can train on that and discover something new.

So far, I’d say, at least from my own personal experience, the computers have only gotten good enough to predict the average, at least when it comes to predicting the DNA-binding activity of proteins given a certain biological context. For example, any protein has only the average DNA-binding activity of any other context that the model has seen. This is the real-life equivalent of the computer looking at a haystack and saying “it’s all hay” even though we, as humans, know that there’s a few needles in there; but because the pile is mostly hay, the computer averages the pile out. 

While the Bitter Lesson philosophy can be interpreted into biological AI/ML modeling as letting the computers figure it out for themselves, I think biological data is still too abstract to completely leave the computers to their own devices (pun not intended) without having some human knowledge injected. Biological data (the “input”) is already too abstracted away from conclusions that we try to make from it (the “output”), so it’s not really fair to expect a computer to be able to go from one side to the other without giving it enough examples to learn from.

There’s still a requirement, then, for having enough data, and really it’s not just having enough data, but having enough signal in the data for the computers to learn from. The real-life equivalent here is like if you trained a computer to label animals in a photograph, but only fed the computer photos of dogs, cats, and birds to learn from, then it wouldn’t be surprising for the computer to not know what a squirrel is. Similarly with biological AI/ML, if we’re not feeding the right data, we shouldn’t be able to expect the computer to predict something it’s never seen before.

In current times, we’re seeing similar things with ChatGPT, like the funny observations where ChatGPT can’t count how many letter “b” there are in the word “blueberry”; ChatGPT is a model trained to predict the next words based on the previous words, and the way it “thinks” about words isn’t even in words but in “tokens”, so it’s not surprising that the model doesn’t work correctly when it’s being used in a way it wasn’t trained to be used.

The idea of foundation models fits into this, in my mind, because they’re meant to be stand-alone models trained for a specific use, but could go beyond their original task to other more general tasks. Foundation models are quite popular in biology and chemistry lately, with everyone looking to build their own foundation model based on whatever their niche expertise happens to be, and I suppose hoping that the “generalizability” will come as a happy surprise after the model is built. Guilty of this myself, as we’re building some foundation models at my company; why not, if we have the data and the expertise to try it?

There’s plenty of criticism for foundation models as being overhyped these days, since so many people are working on them. Some of these models are doomed to fail, I think, because they’re relying too much on publicly available data which suffers from the annotation and quality challenges I mentioned above. Others will fail because of hubris, where there’s some pretty big egos who think that the only reason biology hasn’t been “solved” is because their genius input hasn’t been contributed yet. Realistically, I think there’s still some fundamental housekeeping work that’s required to get to useful AI/ML for biology, namely in data processing and cleanup (again, like the metadata and quality issues above), but also just in how we present the data to the computer. You can’t just feed data to a computer, you need to turn the data into something the computer understands. This is something I’m not sure we’ve figured out yet.

In the end, despite all the negativity I’ve got going on in this ramble, it’s not only inevitable that breakthrough AI/ML models are going to happen for biology and chemistry, but it’s already happened so it’s just a matter of time. The 2024 Nobel Prize for chemistry went to an AI/ML model of protein structure, AlphaFold/Rosetta, so I think it’s unlikely it’ll be a one-hit-wonder and we won’t get more interesting (if imperfect) models in the coming years.

After all, “all models are wrong” (George Box) but some are useful.

CHAPTER TEN: PROTEINS

Double post day! I’ve been wondering whether I should do a more “professional” content chapter instead of general personal rambles I’ve been doing so far, and I think today we’ll be doing a casual introduction to proteomics. This is going to be poorly cited, if at all, and not peer reviewed or particularly in depth, it’s just my own description of how I think about proteins. Although I’m professionally a deeply trained “proteomicist”, I still think of proteins in a kind of personified view. For whatever reason, that’s just how my brain works. There’s some great academic resources on the history of proteins and the study of proteins (“proteomics”), like this one from the Human Proteome Organization (HUPO). This chapter isn’t going to be that, though.

Some people have taken a biology class in high school before and gone over the classical “central dogma” of biology, which is that DNA is transcribed into RNA, which is then translated into proteins, and then proteins are the molecular machines that do most of the “work” around a cell. Proteins are the motors that pull cargo from one side of the cell to another, they’re the catalysts that turn one molecule into another, and they’re the control switches that turn genes on or off. Each of those “jobs” make up classes or families of proteins: in the previous sentence, those are kinesins, enzymes, and transcription factors, respectively. Transcription factors are my personal favorite lately, with the work I do at my startup company, but all proteins are pretty cool and it still blows my mind to watch the artistic interpretations of proteins in action like this or the paintings by David S. Goodsell

Measuring proteins is a lot harder than measuring DNA or RNA. While genome sequencing has become pretty mainstream, with direct-to-consumer kits to sequence yourself (23andMe) or even your dog (Embark), proteomics has had a harder time getting up to the Moore’s Law trajectory. DNA is made up of four nucleic acids; proteins are made up of 20 amino acids. The range of size and chemical properties across the 20 amino acids is much more varied than the sizes and chemical properties of the four nucleic acids. Between that and the combinatorics, these base components of each molecule is one thing that makes measuring proteins so hard. 

Another complication is the “dynamic range” of the molecules. Dynamic range refers to how some molecules are very common and highly abundant, while other molecules are rare and lowly abundant. This is the “needle in a haystack” problem, where some protein molecules are very common and abundant (the hay) and others are rare (the needle) – it’s hard to count how many needles might be in a haystack because there’s just an overwhelming amount of hay to deal with. It’s the same thing with proteins, where some proteins are just really abundant, like albumin, while other proteins are very low abundant, such as hormones like testosterone or estradiol. There are tricks you can do to manipulate the metaphorical haystack to make it easier to find the needles. Enrichment techniques like immunopurification bind the protein of interest using an antibody to fish the protein out of the mix, basically like using a magnet to pull out the metal needles from the hay; depletion techniques light the hay on fire to leave behind the needles. At the end of the day, even with sample manipulations, dynamic range is still a challenge. For a more academic breakdown of the dynamic range problem, one of my PhD advisors did a theoretical breakdown of why the dynamic range of proteomics makes it magnitudes more difficult than other ‘omes.

A third challenge is that there is no copy-paste for proteins. DNA and RNA have an amazing trick called polymerase chain reaction (PCR), which essentially means that scientists can take one single molecule and make as many copies of it as they want. Proteins, sadly, don’t have this miraculous invention. It would surely be a Nobel Prize to anybody who figured it out though. So, whatever you want to measure, you’re not only going to have problems based on the complexity of the biochemistry, but also the analytical dynamic range, and now also you can’t make more if you need to try it again.

Nevertheless, there are indeed ways that we measure proteins. A lot of ways, really, and scientists have been doing it for a long time. This won’t be an exhaustive list by any means (again, this isn’t peer reviewed or anything, it’s just me riffing off the top of my head and the tips of my fingers) but I’ll focus here on the measurement techniques that tell you protein identity or components, rather than overall protein concentration like Bradford, BCA, or Nanodrop. 

There’s old school Edman degradation, where a protein is stretched out into the linear string of its component amino acids, then each amino acid is chopped away one by one to come up with the full sequence of the protein. There’s antibody-based approaches, like Western blots or fluorescent assays, where you have a tool kit of antibodies which will recognize partial shapes of proteins, and you can use those antibodies to “light up” different proteins; if an antibody lights up, it means that protein is present in the sample. There’s mass spectrometry, which smashes up the protein and then reassembles the pieces using the mass (as the name implies) of each amino acid as a puzzle piece to fit the sequence of the protein back together. Finally, there’s new technologies like nanopore sequencing, which uses electric currents passed through each amino acid of the protein to identify which amino acid it is, and piece the protein sequence together that way.

All are pretty wildly creative ways to solve the protein measurement problem. For the record, the most common way to do DNA sequencing uses the “light up” approach, where each of the four base pairs gets a specific color assigned to it, and then bases are sequentially chopped off kind of like the Edman degradation approach (specifically, it’s called Sanger sequencing) – the last, terminal DNA base will light up “red”, a tiny microscope inside the sequencer will see the “red” and assign that to its respective nucleic acid and that base gets chopped off. The next DNA base is run, and lights up “green”, and the microscope records that nucleic acid, which gets chopped off for the next; and so on and so on until the entire piece of DNA has been sequenced, base by base. It’s pretty smart, and there’s some newer ways to do DNA sequencing (“next gen sequencing”) but that’s really the gist of it.

Unfortunately it doesn’t work for proteins, at least not for proteins in a complex mixture like, for example, blood. It’s a math problem in that doing sequencing with the light-up bits and microscopes is inherently limited by the physics of light, something my PhD advisor details more in his manuscript, but the TL;DR is that the size of the experiment for DNA sequencing (flow cell) easily fits in the size of your hand, but based a maximum density of the wavelength of light, the experiment (flow cell) to do protein sequencing would need to be 1 m^2. 

Some people are trying to get around that problem by using the enrichment/depletion tricks I mentioned before, so that the proteome is less complex and therefore doesn’t require such a big experiment (flow cell) system, but then you’re also limited to only measuring a portion of the proteome/haystack.

Probably the “next gen proteomics” approach I’m most excited about is nanopore sequencing, where they pull the protein through a literal pore (which, hilariously, is actually made up of OTHER proteins itself) and use electrical currents to determine which amino acid is going through the pore. There’s been a lot of work on this for DNA sequencing, but again, DNA is easier because there’s only four bases and they’re all chemically similar, while proteins have challenge 1 and 2 from above, in that there are 20 amino acids and they’re highly varied in size and chemistry. The size and chemistry variation makes it hard to fit everything through the “same” pore – if the literal hole in the pore is too big, the electrical current won’t be good enough to sense the small amino acids, but if the hole is too small, the really big amino acids won’t be able to fit. So while nanopore sequencing is, to me, the most science-fiction/fantasy approach of them all, it’s also quite a ways from being commercialized for proteins, I think.

We’re stuck with mass spectrometry for now, for the most part. That’s not to say mass spectrometry is perfect. It’s a cool $1M to buy a mass spectrometer, so we desperately need better ways to measure proteins. I’ve written a ton about that in the past, both formally and informally. There’s lots of startup companies trying to solve the protein measurement problem using the various approaches above, as mentioned, or at least introduce some alternative ways to measure proteins, but unfortunately nothing is really available yet. Some are close, I think, but price is going to be a barrier still because I’m sure they’ll be expensive.

In the end, I’m excited to see what might come out in the next 2-5 years. I’m not personally or professionally banking on mass spectrometry remaining the only really viable technology for proteomics although I certainly will always have a soft spot for mass spec. It’s cool to think about what would be possible with a cheaper, faster, and/or more sensitive protein measurement system.

CHAPTER NINE: LIMINAL

The purgatory of being in-between naive beginner excitement and rational experienced master is basically where I live for most skills, just right there in the Trough of Disillusionment on the Gartner hype cycle. It feels like there’s a ton of support for “getting started”, and a ton of support for highly niche specialization, but just not a lot that helps you get through the purgatory of “intermediate”. This goes for learning a new language, or how to code, or baking, or entrepreneurship, or writing, or managing, or whatever. There’s so much out there to support the zero-to-one initialization, and then there’s deep subject matter expertise, but the “messy middle” is really hard, seemingly endlessly wandering through a liminal space.

I don’t really feel like I’m hitting “enlightment” in anything, just maybe finding how much farther down the “disillusionment” goes, but I’m also not anywhere near “inflated expectations” for any of my skills so that seems to put me pretty solidly in the “intermediate” range. I love picking up new things – I mentioned before that I challenged myself to learn how to bake macarons, and that I wanted to learn how to code so I joined a machine learning lab – and it’s so frustrating to get the basics down then just have zero resources to get through “intermediate” to “fluent” or “advanced”. There’s the 10,000 hours rule, which says that it takes 10,000 hours to master a skill, but that seems to emphasize the point that there’s always tons of resources to help you with the first 10-100 hours, but after that it’s just supposed to be grinding until you reach near-mastery and can get into the ultra-deep niche groups, I guess.

So how do you get through “intermediate” to be considered a “master”? Although it’s the 10,000 hour rule, realistically that’s more like 5 years of fairly dedicated training, so about the average American science PhD program. The first year is structured classes like high school or college, with more in depth materials, but after the first year or two it’s all unstructured research, largely self-guided with some input from your PhD advisors and thesis committee members. In the end, when you defend and get the “PhD” letters after your name, society generally recognizes you as a “master” in that subject, which is itself a hilariously sub-sub-sub-field specific niche, a tiny drop in the vast vast ocean of human knowledge.

In the things where I’m “intermediate”, I don’t feel like I make a lot of progress after those first 1-2 years of structured learning. Maybe because the rest all needs to be self-guided? I wish there was more structure out there for intermediate anything, to at least learn more about what I need to learn. 

I probably just need to learn to embrace the journey that is being “intermediate” and find ways to enjoy the process more.

CHAPTER EIGHT: ACCOUNTABILITY

Yesterday’s confession had me thinking about the “looking busy” aspects of building or working “in the open”. The thought had crossed my mind that I could just avoid posting anything until I’d  built up a ton of writing that I would push all on one day in order to make up for all the days I’d missed, then make it seem as though I had been perfectly productive the whole time. I’m not sure why I thought of that. Maybe the shame of missing days, but then I’d never actually committed to a specific number of words per day, just the overall monthly goal of 50k, which in theory could have been 50k all at once. 

I’ve mentioned before that I’m more of a “point me in a direction and then let me work on it independently” so having this daily-posting accountability feels kind of weird, even if I’m the only one holding myself accountable. And this working style is great for some of my work, where it’s almost entirely dependent on just me to get it done, but a bigger proportion of my work is collaborative and can’t be done by myself. For those things, I definitely envy people who are more collaborative and do best with social aspects of working in a group. I just get kind of antsy when I’m in a group project and I’m waiting around for someone else to hand something off to me, because I think about all the other things I could be doing while I’m waiting for other people to get around to my handoff. 

There’s probably also a toxic side to “building in the open” or otherwise actively involving people in touch points that they can’t really influence themselves. It strikes me as a sort of “performative productivity” where people make their to-do list into something that needs to be witnessed by others while they’re actively doing it. If there’s more than 2-3 people at the table, but it’s only two people talking to each other the whole time, I feel like it becomes “performative” in the sense that all the other people are really just an audience to the two people having the discussion. Even worse, the meeting is just one person talking at everyone else, with no discussion. There’s some edge cases there, like presentations or lectures that are designed to be seminar-style dissemination of information, rather than discussion, but I think we all know the type of meeting I’m referring to, where you can go the entire meeting without ever saying anything. 

There’s some caveats to that too, where even if the meeting is designed to be discussion focused, it turns into just a few people dominating the conversation due to power dynamics or personality traits. Sometimes the meeting is just too many people and there’s no chance for everyone to weigh in on every topic. I think a thoughtful agenda can negate some of that, and I’m a fan of the “Inform, Discuss, Decide” format where everyone gets a set of “pre-read” information to help them prepare for the meeting or orient to the material, then a handful of discussion points with discreet decisions that have to be made. 

I have a lot of opinions about meetings because I spend probably the majority of my week in some kind of meeting, whether for my full-time job work or committees that I volunteer to sit on or for my professional society involvements or for my own career development mentorship. There’s some I look forward to and some I dread. 

I will say – and this probably marks me as a “manager” – that there’s some interesting tradeoffs with in-person meetings versus virtual meetings. I think virtual meetings, overall, tend to be more productive because there’s a layer of impersonal-ness to them, being through a screen or a phone. In-person meetings, while less productive, are more emotional (for better or worse) due to the interpersonal-ness of seeing or sensing people’s body language. Both are good, but for different reasons. I actually prefer 1-1 meetings to be in-person, but meetings with more than 3-4 people to be virtual – a 1-1 is more about the interpersonal relationship while the larger meetings are more about getting something decided. Well, usually the big meetings are about getting something done, but not always, there’s certainly some larger meetings that are also about interpersonal relationships or general vibe checking that is best done in-person versus remote. 

On one hand, then, I understand the “return to office” argument of morale and culture building. I do think hybrid teams end up being more productive overall, having a mix of both, although I’m not using any statistical data to back that up, just my own vibes and my own bias having a scientific background where at least some work must be in-person at the bench and some work can be done remotely like computational analyses.

Getting some space to work remotely gives me space and time to think a little deeper. I also prefer the “give me a direction and then let me work” management style, so that makes sense that I wouldn’t want to be constantly “building in the open” with collaborative group-project style work, Id rather get to some meaningful milestone before I share what I’ve been working on. 

While that’s what fits my work style best, I also completely understand it’s not always practical or responsible to go near radio silence for stretches of time while I’m working things out (or, as yesterday proves, not doing anything…); there needs to be some way to measure my progress, my productivity, and ensure I’m meeting expectations and not stuck or blocked and not asking for help. Sometimes that can be as simple as keeping a running document with analyses or outcomes that can be accessed by all the stakeholders, for example like when I’m writing a paper with multiple coauthors, using a shared document where they can see my progress (or lack thereof) and adjust their expectations or reach out with questions and comments accordingly. Sometimes, it’s just shooting a quick email saying “hey, this and that to-do item is on my list, I haven’t forgotten, I just got stuck in step X”, like when someone is waiting on a figure or a dataset. Having some way for my managers or my mentors to get visibility in what I’m doing (while giving me the space to do it) requires that I understand whether my manager or mentor is okay with that approach. 

Maybe that’s why the CONFESSION of not having made progress was hard for me to own up to. There’s no mentor or manager here, just me holding myself accountable, and that was tough.

CHAPTER SEVEN: CONFESSION

Well, clearly I stalled out during the conference. In my defense, I did actually stay pretty busy and made the most of the in-person face-to-face meetings, so I’ll give myself some space for grace. Admittedly, I actually sat down a couple of times to churn out some words, but the combination of the blank screen and the shame of having missed days had me deleting whatever I wrote and laying in bed instead. I’ll have a lot of chapters and words to make up if I’m going to hit the 50k goal by the end of the month, but there’s still time! There’s still a chance! 

In part, I probably just need to make this easier. I’m already noticing I’m holding myself up to a certain quality standard, when the whole point is to break that habit and just get myself into a quantity mindset where editing can happen later, and the main goal is to just get words onto the screen, even if they’re deleted later. It’s that “editing as I go” process that holds myself back in my grant writing and manuscript preparations, where I have some ideal story in my head that I never end up getting out onto the screen because I’m mentally in the revising stage before I even have anything to revise. If that doesn’t make sense, you’re probably better at grinding out that first draft than I am!

Funny enough, I’ve also had a lull in my leisure reading, so I wonder if there’s also some correlation between consuming content and generating it? It might also just be a spurious correlation, where I read for leisure when I have “spare” time, and I also write more when I have spare time; the basis is having spare time for both, but they end up looking correlated from dependence through that shared variable.

I’m also just openly admitting that I’m not writing as much as I hoped. Earlier today, I confessed to not making any progress on a manuscript for which I’d promised to have some updates by now, and while it wasn’t surprising or really blocking anybody’s work from getting done, it felt good to just admit that I’d dropped the ball. It doesn’t feel good to not have made progress, but it felt somewhat relieving to just admit that I’m struggling.

So hopefully this can be the day that I get myself back into the swing of things, and make a renewed effort to build up the writing habit!

CHAPTER SIX: CONFERENCE

I missed a day. Zero words written yesterday. But I’m back in the saddle now, coming at you live (-ish) from Toronto for the Human Proteome Organization’s annual conference.

The conference has been, of course, my top of mind, and I thought I’d give a whack at the trite, over-done “conference advice” content. So here’s my take on that, trying to both add something new but also reiterate the common yet important bits.

First, bring comfy shoes. Big conferences like HUPO and the American Society for Mass Spectrometry (ASMS) and JP Morgan Healthcare (well, the BIO partnering part at least) are so spread out that having uncomfortable shoes is going to ruin your whole week. Comfy shoes can still be cute, but break them in before the conference at least. This was the very first advice I got when I went to my first ASMS and it’s still true. I think about it every time I pack for a conference.

Second, survival kit! I like to have some minimal first aid (bandaids, pain killer) in my work bag all the time, but for conferences I also stock my hotel room with water, granola bars, and some treats either from a grocery store or a convenience shop. Conference venue food is almost always overpriced and under-nutrition’ed, so having (somewhat) healthier options in my room keeps my costs down and helps make sure I’ll get through the week with a somewhat functional body and mind.

Third, mental health. To people who know me, it might be surprising to hear that I’m actually pretty introverted. To the people who _really_ know me, it’s not a surprise at all. I think there’s a two-by-two matrix: introverted vs extroverted, socially skilled or not. So some people are the stereotypical extroverted and socially skilled; some are introverted and not so socially skilled. But there’s also people who are extroverted and not so socially skilled. I’m thinking of people who don’t pick up on when the conversation has moved on or talk way too close in your personal space, stuff like that. And I’m maybe socially skilled but get quickly drained without quiet chances to recharge. So if you know you’re introverted (either the socially skilled or not so skilled) then make sure you’re intentionally scheduling some recharge-time for yourself so that you can show up for social events with the right energy.

Other things for mental health might be packing some exercise clothes to get in a run and making sure you schedule intentional times to get that run or gym session in. You might schedule a sight-seeing break. You might schedule some calls home to talk to friends or family during a quiet moment. Maybe you even take a nap. Which leads to my last thought –

Finally, make sure you’re getting the full value out of the conference. The main value of a conference is the in-person meetings and connections.The networking sessions and the happy hours, the poster sessions and the coffee line chats. Maybe this is controversial, but in the age of preprints and webinars, I’m not so sure the value of a conference is going to oral sessions. Not to mention how packed and overwhelming conference schedules like HUPO and ASMS are, where every hour is filled with content. If you spend all day running from one session to another, it’s unlikely you’ll have the energy to show up fully for the networking parts of the conference that count the most. So if your mental health is going to be better with a quick nap or reading a few chapters of a book, then by all means skip that afternoon oral session so you can recharge for the poster session. Conferences like HUPO and ASMS are marathons, not sprints.

Putting in “face-time” is of course the main point, but you also want to build on those initial connections. Having physical business cards or at least an up-to-date LinkedIn profile will help you instantly exchange contact information with the people you meet. If you’re like me and have trouble remembering names and faces, you can write a quick email or LinkedIn message to the people you meet as soon as you can, and note in your message how you met them and what the next steps might be – “Hey, Bob, it was great meeting you at the ABC happy hour, I really enjoyed talking with you about sample prep. I’d love to get the link to the paper you mentioned, please send that over when you get a chance after the conference!” Something like that.

Some of that last paragraph starts to veer into another topic, “PR”/branding/”reputation” for scientists or other professionals, but that might be another topic for a different day.

There’s a few things I specifically have been doing that help me. Some are kind of silly. For example, I like to get to the conference at least one full day before the conference itself actually starts, and use this time in the physical conference setting to get myself in the mental mindset for the conference. I usually finish up some slides or scout some quiet 1-1 meeting spaces, and I get my “conference persona” on – I do my nails, put on a face mask, and lay out my clothes for the week. Once I’m all organized, I feel “ready” to take on the week, and usually I end up getting a chance to grab coffee or beers with someone else who arrived early, and I can kind of get in a “warm up” socialization event before the rush of the conference really starts. This is also my chance to schedule some grocery delivery or hit a store to stock up on my essentials: water, electrolyte mix, protein bars, and some kind of gummy candy. A girl needs a treat to look forward to when she gets back to her hotel room after a long day of socializing, okay?!

CHAPTER FIVE: FASHION

Not much time tonight before my early morning flight, but sometimes building a habit is just showing up consistently even if the effort isn’t quite the same every time. 

Today I’m thinking about the cyclic nature of trends. I’ve finally hit the age where I’ve lived my adult life through some clothing and style fashion cycles, where the things I wore as a teenager or young adult are now considered “cool” again. I can’t pull them off anymore, of course, things like crop tops and low-rise bellbottoms, but it’s jarring to drive past the university and see the students in stuff I might’ve worn when I was their age. My clothing definitely dates me now. I’m told nobody wears skinny jeans.

My PhD advisor likes to say that “everything old is new again”, and while it’s obviously true of clothing, music, and other fashions, he’s usually referring to science. The trends of what’s hot or popular in science cycles just like white eyeliner or hair scrunchies. I’ve found that mantra to be pretty inspiring, because you can look back at what was being published and hyped 20-30 years ago, and almost any one of those ideas replicated today will generate the same interest since we have new technologies to apply. Some of those ideas were extremely elegant in concept, but were ahead of their time. Resources like hardware or software weren’t fast enough or sensitive enough to make those concepts really shine, so revisiting them today with the tools we now have available gives them an intellectual resurrection. For example, I’m looking back at Nature journal’s 1995 articles and seeing a lot of protein crystal structure work; just last year, the Nobel Prize in Chemistry was awarded for protein structure prediction. (I’m also seeing a lot of transcription factor work, so… hmm…)

Now if mass spectrometer vendors could figure out how to make a clear, see-through instrument like the vintage iMac computers, we’d really be cooking.