Thursday, October 1, 2015

Practically relevant software engineering research

A paper about proactive conflict detection, which is part of my speculative analysis research project, has been ranked the most practically relevant software engineering research of the last five years.  In a survey of 500 software developers, the proactive conflict detection paper was rated as "essential" to software development practice by the most participants, out of 600 papers published in the last five years.

Proactive conflict detection helps people to collaborate more effectively.  When teammates work in parallel, they may make changes that are independently good but which, when combined, break the software.  The Crystal tool continuously tries to merge different people's changes, before the software developers do so and without making permanent changes or interfering with the developers.  If the changes are in conflict, then Crystal immediately and unobtrusively notifies the developers so they can fix it while the changes are fresh in their minds, and before they waste time on code that would have to be reworked or discarded.  If the changes are not in conflict, then developers can proceed with confidence, without worrying about negative consequences.  In either case, developers can spend less time coordinating with their teammates and more time getting their jobs done.

The research project was joint work with Yuriy Brun, Reid Holmes, and David Notkin.

To learn more about proactive conflict detection, watch a video or read the technical paper:  

Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin, Early Detection of Collaboration Conflicts and Risks, IEEE Transactions on Software Engineering, 2013.

Other papers resulting from the project appear on the speculative analysis webpage.

The industrial relevance study wast performed by Microsoft Research and Singapore Management University.  A paper describing it was published at the ESEC/FSE conference in September 2015:

David Lo, Nachiappan Nagappan, and Thomas Zimmermann, How Practitioners Perceive the Relevance of Software Engineering Research, ESEC/FSE, 2015.

A tip of the hat to Yuriy Brun for pointing me at this paper.

This is a nice complement to a previous honor. In 2013, Microsoft Academic Search ranked me 2nd among software engineering researchers worldwide, for work in the previous 10 years.

Friday, July 10, 2015

No German in the building

As my sabbatical comes to a close, I increase my appreciation for the remarkable talents of Peter Druschel, who hosted my previous sabbatical and has built up MPI-SWS into a world-class research institution.

One of Peter's rules was to prohibit speaking German at work.  If your were caught you breaking the rule, you would get a gentle Teutonic reprimand.  I heard some grumbling that Peter even enforced this rule when playing soccer at lunchtime.

Although German would have been comfortable for some people, it would have excluded others.  Equally importantly, Peter wants even students who are not native English speakers to be perfectly comfortable in English  for example, so that they can network effectively at conference dinners.

Peter found he had to adjust his rule from "no German in the building" to "only English in the building" when he discovered people speaking Hindi in their offices.

During this year's sabbatical, at IMDEA, conversations were in a mix of Spanish and English, but mostly English.

At UBA, the technical work is mostly in Spanish, except when I am around.  At lunchtime and during informal preliminaries, I speak Spanish, but I find it easier to be precise in English.  The faculty I interact with are completely fluent in English, and they are glad that I am forcing the language to be English since that is good practice for the students.  They enthusiastically adopted Peter's original "no German in the building" rule.  (Only Sven Stork and I could have objected, and my German is too primitive for Sven to ever want to hear it!)  But they weren't willing to upgrade to Peter's current "only English" rule.

Sunday, June 21, 2015

How to move a project from Google Code to GitHub

Google Code is shutting down, so you need to move your projects to a new hosting site.

This article tells you how to export a project from Google Code to GitHub.  Google Code provides an "Export to GitHub" button at the top of your project pages.  That button uploads the repository to GitHub (first converting it to Git if needed), but it doesn't always work; even when it does, there is a lot more to moving an entire project.  This blog posting takes you through the extra steps that you need to perform.  Some of these steps ought to be done by the "export to GitHub" button, but the export tool is incomplete.  Others are necessarily manual steps.
  1. Warn your teammates that you are going to move the repository; give them a chance to push their changes or do other cleanup.  Ask them not to make any more changes to their clone, ever.
  2. I suffered authentication problems when I clicked the "Export to GitHub" button in parallel for multiple projects in different browser tabs.  Click it for one project; go through GitHub authentication, etc.; and then you can click the "Export to GitHub" button for the next project.
  3. The "Export to GitHub" button won't work if your repository is too big or if it ever contained a file larger than 100MB.  In this case, you need to do the conversion by hand.
    1. Create the new repository on GitHub
    2. Run the following:
      cd DIRFORGITCLONE
      git init
      fast-export/hg-fast-export.sh -r HGCLONE
      git checkout HEAD
      git remote add origin git@github.com:USERNAME/PROJECTNAME.git
      git push --set-upstream origin master
    3. If the push failed because your repository contains files that are too large, use the BFG Repo-Cleaner to remove them, then redo the push.  Sample command lines:
      bfg --strip-blobs-bigger-than 100M
      git gc --aggressive --prune=now
    4. Migrate the issues:  see https://code.google.com/p/support-tools/wiki/IssueExporterTool .  I didn't have a problem with the throttling; issues seemed to be uploaded faster than one per second.
  4. Redo the repository export if you didn't do it manually in the previous step.
    The reason is that often, the tool run by the "Export to GitHub" button says that repository conversion was successful, but in fact some history is lost or munged (especially around merges and file renames).  One way to identify problems is to run  git log --graph  and look at the end of the output.  There should be just one root, which is from the beginning of the history; if there are multiple roots or a root that is not at the beginning of the history, then you must regenerate the history from the Mercurial repository.  However, I recommend that you always do so, just in case.
    Run the following commands:
    cd ~
    git clone https://github.com/frej/fast-export.git
    REPO=myrepositoryname
    cd REPOSITORY_PARENT
    rm -rf $REPO
    git init $REPO

    cd $REPO

    ~/fast-export/hg-fast-export.sh -r HGCLONE --force  -A ~/authornames.txt

    git checkout HEAD

    git remote add origin git@github.com:mygitusername/$REPO
    git push --force --set-upstream origin master
  5. The GitHub repository appeared in your personal GitHub account.  If you collaborate with other people on the project, move the repository to an organization's account by browsing to "Settings > Transfer ownership".
  6. Rename your old clone of the Google Code repository, and clone the new GitHub repository.
    If you manage your clones using a program such as mvc, then update its control file, such as ~/.mvc-checkouts.
  7. If the old Google Code repository used Mercurial:
    1. convert the ignore file from Mercurial to Git:
      git mv .hgignore .gitignore
    2. Remove any occurrence of syntax: glob, and convert any regex patterns in the .gitignore file into globs.  Convert other patterns as necessary.  Take care because git interprets patterns differently depending on whether they contain a slash ("/") or not.
    3. Build the project and run tests to create temporary files that should be ignored.
    4. Run git status to ensure that they are ignored.
    5. git commit -m "Rename .hgignore -> .gitignore" .hgignore .gitignore
  8. At top of the GitHub page for the project (not the GitHub Pages page):
    1. Click the red "Stop ignoring" button
    2. Add text for the description field (from Google Code description)
    3. Add a link to a real homepage for the project.
      If it doesn't have one already, then use http://USERNAME.github.io/PROJECTNAME/ (this is a better choice than the wiki), and create this homepage per the next step.
  9. Create a homepage for the project, if it doesn't have one already.
    A good way to do this is to create a GitHub Pages (github.io) homepage at http://USERNAME.github.io/PROJECTNAME/:
    git checkout --orphan gh-pages
    git rm -rf .
    rm .gitignore

    Create index.html
    git add index.html
    git commit -a -m "First GitHub Pages commit"
    git push --set-upstream origin gh-pages
    Browse thttp://USERNAME.github.io/PROJECTNAME/ to verify the content.
  10. If your project doesn't arlready have a README file, then create a README file (or maybe README.md), which will show up at the bottom of your GitHub page.  If you are creating the README purely for the benefit of people browsing on GitHub, and your real documentation appears elsewhere, then your README will be short and will redirect people to the project's real homepage.  Your project ought to include two different types of documentation:  for users and for contributors.  These often appear in different places but should be easy to find.
    You should also include a LICENSE file.  Otherwise, nobody (who is under the control of a legal department) will be able to use your software.
  11. If your wiki is intended as developer-written documentation, then move all wiki pages to GitHub pages or to the main repository: see https://github.com/morgant/finishGoogleCodeGitHubWikiMigration .  (The instructions are hard to follow; read them carefully!)
    Only keep the pages in a wiki if you expect users to edit it.
    The import process creates a "wiki" branch.  If this branch is useless (especially if it contains only one file or is redundant with the project's homepage or GitHub Pages page), then delete it by clicking the appropriate trashcan icon at https://github.com/USERNAME/PROJECTNAME/branches
  12. Look up the permissions on Google Code at https://code.google.com/p/PROJECTNAME/adminMembers and give corresponding permissions on GitHub at https://github.com/USERNAME/PROJECTNAME/settings/collaboration .  Users in the Owners group don't get notifications, so put all of those people also in some other group.
  13. Convert post-commit hooks to GitHub; see "Post-Commit URL" and "Post-Commit Authentication Key" at https://code.google.com/p/PROJECTNAME/adminSource and add them at https://github.com/USERNAME/PROJECTNAME/settings/hooks/new .
  14. Add email notifications at https://github.com/USERNAME/PROJECTNAME/settings/hooks/new?service=email corresponding to the "Activity notifications" at https://code.google.com/p/PROJECTNAME/adminSource .
  15. Copy files in the Google Code "Downloads" section to elsewhere.
  16. The project's documentation and buildfiles probably refer to Google Code.  Update all such mentions.
    1. Update direct references to files in the repository:
      preplace "https?://PROJECTNAME.googlecode.com/hg/"   "https://raw.githubusercontent.com/USERNAME/PROJECTNAME/master/"
    2. Seach for more occurrences of googlecode.com and code.google.com, and edit each one manually.
    3. If the old Google Code repository used Mercurial, search for occurrences of hg, and edit each one manually.
  17. Write up the change of repositories for your changelog or release notes.
  18. Run tests, then commit and push your changes to GitHub.  For example:
    git commit -a -m "Update references from Google Code to GitHub and from Mercurial (Hg) to Git"
  19. If the push created or changed your project's homepage, run a link checker to check for broken links:
    plume-lib/bin/checklink -q -r `grep -v '^#' plume-lib/bin/checklink-args.txt` MYURL
  20. Update your continuous integration server so that it refers to the new repository location.
  21. If you have a README.html or similar HTML file in your Google Code repository, then external links to it may exist.  In the old Google Code repository, edit it to add a line such as
      <meta http-equiv="Refresh" content="1;URL=http://USERNAME.github.io/PROJECTNAME/" />
    to the header, and commit and push to Google Code.
  22. Set up forwarding ("project moved") at Google Code, at https://code.google.com/p/PROJECTNAME/adminAdvanced .  Forward to the project's homepage or new GitHub Pages homepage.
    If you ever need to temporarily undo the forwarding, go to https://code.google.com/p/PROJECTNAME/adminAdvanced .  Google Code may or may not remember the forwarding URL; you will need it when you re-enable forwarding.
  23. Tell anyone who might have cloned the old Google Code repository to rename their clone, not use it any more, and instead clone and use the new GitHub repository.
If you have any corrections or additions to this guide, let me know.

Tuesday, April 28, 2015

Declarative specification of FSM-inference algorithms

The paper “Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms” recently appeared in IEEE Transactions on Software Engineering.  The paper is by Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, me, and Arvind Krishnamurthy.  The title is a mouthful, because the paper accomplishes quite a bit, but I'll try to break it down here.

The models that the paper speaks of are finite state machines (FSMs), like this one that expresses some legal sequences of commands that a client program might issue to a mail program:


It is useful to have a FSM that expresses the behaviors of a system.  Unfortunately, the system designers may not have written one down, or the one they wrote may not be accurate.  An alternative is to infer a FSM from observed system executions.  This is such a good idea that many researchers have published such model-inference algorithms.


Computer scientists usually write algorithms as pseudocode, and model-inference algorithms are no exception.  Pseudocode is familiar to programmers and enables the inference algorithm to be implemented, but it is not helpful in understanding the inference algorithm.  The extensive proofs in any algorithm textbook show the limitations of reasoning about pseudocode.

The paper proposes InvariMint, an approach to specify model-inference algorithms declaratively.  The algorithm designer specifies what properties of the input log the algorithm should preserve, rather than how the algorithm works in terms of programming-language statements.  For instance, the algorithm designer might say, "If event A is always followed by event B in the log, then the algorithm should output a model in which event A must always be followed by event B."  This circumscribes what types of generalization that the algorithm is allowed to do.  The algorithm designer doesn't have to figure out how to encode this in a programming language nor to verify that the inference implementation is correct.


Here is how InvariMint works, in a nutshell:

  • At design time, the algorithm designer decides what sorts of properties the inference algorithm should express about a trace.  The designer generalizes these properties to patterns and specifies them as FSMs.
  • At run time, given an execution trace (a log), the algorithm mines matches for each of those patterns from the trace.  The algorithm intersects all the mined FSMs to obtain the final inferred model for the program.


We applied the InvariMint declarative approach to two model-inference algorithms that had previously been specified procedurally.  Specifying them declaratively with InvariMint (1) leads to new fundamental insights and better understanding of existing algorithms, (2) simplifies creation of new algorithms, including hybrids that combine or extend existing algorithms, and (3) makes it easy to compare and contrast previously published algorithms.  Furthermore, the InvariMint-generated algorithms were significantly faster than the original procedural versions of the algorithms.

You can read the paper here.  The InvariMint implementation is distributed along with Synoptic.

Monday, April 13, 2015

NSF GRFs for Pavel Panchekha and Doug Woos

I am delighted that two of my students  Pavel Panchekha and Doug Woos  have won NSF graduate fellowships this year.

Pavel applies his mathematical background to problems in compilers and verification.  Pavel is co-advised by Zach Tatlock.

Doug works in the intersection of systems, networks, and programming languages.  Doug is co-advised by Tom Anderson.

Both Doug and Pavel are authors of our forthcoming PLDI paper on verifying distributed system implementations.

Overall, UW CSE garnered 9 NSF GRFs — more than CMU, MIT, or Stanford  and only one less than Berkeley, which has a larger student pool than UW.  UW's success is mostly because our students are terrific, but also because we advocate fiercely for their success.