RSS: Should stand for Real Simple, Sorta

Submitted by Barrett on Sun, 07/26/2015 - 09:48
RSS: Should stand for Real Simple, Sorta

For ages I've thought that I should get this site included on the Drupal Planet feed, and with the new build in D8 and a currently renewed momentum to actually write here, this seemed like a good time to actually do it.  Should be simple, right? Tag my content, allow commenting, submit the RSS feed url for inclusion, profit.  Not so much.

The hang up is that the RSS generated by the core taxonomy term feed view doesn't validate, according to the W3C validator; looks like it's gacking on the inclusion of unencoded HTML tags within the Description not being in fields labeled as CDATA.

Yes, this is a thing I can fix in a few different ways, but doesn't it seem like using an RSS feed view should just automagically handle all that drek?

Barrett Sun, 07/26/2015 - 09:48

Finding all the redirects for a content type

Submitted by Barrett on Sat, 07/25/2015 - 12:55
Finding all the redirects for a content type

We're in the middle of a drupal to drupal migration of one of our content types (blogs) and a stakeholder pointed out that they have a large set of redirects for the contet types that will need to be migrated, which raises the question of how to find all the redirects.  The easy thing to do is to list the redirects where the source is the node id of blogs published blogs.

select nid, dn.title, r.* from drupal_node as dn join drupal_path_redirect as r on dn.nid=substring(source, 6) where status=1 and dn.type='blog_post'

The problem is that redirects can be created where the source is an arbitrary string, like for instance, the url alias for a node.  So the list above has to augmented with redirects where the source is the alias(s) of a blog.

select nid, da.src as alias_source, da.dst as alias_destination, r.source as redirect_source, r.redirect, r.type, from_unixtime(last_used) as last_redirect_use from drupal_node as dn join drupal_url_alias as da on substring(da.src,6)=dn.nid join drupal_path_redirect as r on da.dst = r.source where dn.status=1 and dn.type='blog_post'"

Even with that I dont have complete faith that I've found all the redirects, but I can't see a way to find anything that might be missing (at least, given that the redirects I'm interested in are configured in Drupal and not in .htacces or the CDN).

Barrett Sat, 07/25/2015 - 12:55

Adventures in D8'ing

Submitted by Barrett on Sat, 07/25/2015 - 10:07
Adventures in D8'ing

It's time to make the jump to D8.  Let's face it, my site is mostly a haphazardly maintained blog site; it could really be done in GitHub Pages without real loss of functionality. There's really nothing beyond core that I need, so might as well make the jump now.

First thing I notice: is pathauto still not in core? REALLY? When was the last time you saw a site using node/xxx paths?

Barrett Sat, 07/25/2015 - 10:07

Drupal4Gov Half-Day Security Session

Submitted by Barrett on Mon, 04/06/2015 - 19:59
Drupal4Gov Half-Day Security Session

Mid last month, Dave Stoline and I presented the Security session at the Drupal4Gov half day event at OPM. The PDF of the slides is attached here. Dave focused on common security vulnerabilities and working with the Drupal Security Team while I talked about security related trends like the use of a separate edit domain and HTTPS everywhere.


Barrett Mon, 04/06/2015 - 19:59

Web.config as a honeypot

Submitted by Barrett on Fri, 12/12/2014 - 08:43
Web.config as a honeypot

One of my clients has started using a automated security scanning tool to regularly crawl their Drupal site looking for vulnerabilities. In their first run of the tool, it identified only one issue: a "predictable resource location" vulnerability based on the presence of the web.config file in the Drupal docroot. Since web.config is used for IIS servers and they're not on IIS, the presence of the file isn't really a security issue for them but, if the scanner is going to complain, it's easy enough to remove it from the code base[1]. The web.config file is commonly flagged by such scanners and just as commonly removed from the code base.

What I hadn't considered before is an idea proposed by the client's tech lead: monitoring access attempts for the web.config file as a honeypot for malicious traffic. Since they're not using IIS, any access of that file may well indicate a drive-by vulnerability scanner. Obviously what you do in response to it could be complicated and you wouldn't want to automatically block the IP or something, but flagging the IP for investigation is reasonable and easy enough to do, especially if you're using a log monitoring tool like Sumologic where you can set up automatic alerts.


Personally, though, I think it's more maintainable to add a deny rule for it in the .htaccess file. That way, you don't have to remove the file again each time you update core, you have to merge your .htaccess which often has custom rules anyway. A simple rewrite rule like the one below does the trick

RewriteRule ^/?web\.config$ - [F,L]
Barrett Fri, 12/12/2014 - 08:43


Periodic Assignment module

Submitted by Barrett on Sun, 11/02/2014 - 11:57
Periodic Assignment module…

I've been toying with an idea for what I'm presently calling the Periodic Assignment module. The basic idea is that a user should be able to subscribe to get intermittent assignments from a Drupal site. The user would then complete the assignment and the site would log the completion of the assignment. For instance, given a set of different exercises, I could have the site email me once an hour with a random exercise to do. Or for people with a practice of journaling, the site could have a list of topics on which they could write on and once a day the site would send them one of those topics as a prompt to journal.

It's still very much conceptual at the moment, but the idea is solidifying as I mull it over. If you're interested, what documentation/planning I've done so far is at the link above.

Barrett Sun, 11/02/2014 - 11:57

The Conflicted Developer

Submitted by Barrett on Sat, 10/04/2014 - 12:28
The Conflicted Developer

"The only thing that saves us from the bureaucracy is inefficiency. An efficient bureaucracy is the greatest threat to liberty."

- Eugene McCarthy

I always feel conflicted about my role as a developer when thinking of this quote from McCarthy. As a developer, I want the most efficient possible data recording, indexing, and searching. At the same time, though, in the past it was inconceivable to implement mass-monitoring of the public because the effort to record, store, and correlate the information was simple too great. Clearly that's no longer the case (or at least, is rapidly becoming less of the case), as witnessed by the NSA monitoring programs, the Great Firewall of China, etc.

Barrett Sat, 10/04/2014 - 12:28


What does the White House's Executive Order mean for Open Government

Submitted by Barrett on Sat, 08/10/2013 - 09:14
What does the White House's Executive Order mean for Open Government

I wrote the following piece after looking into the Executive Order on Open Data. The original is published at…

The White House's Executive Order of May 9, will cause a shift in the way that Federal agencies present data. The Executive Order, “Making Open and Machine Readable the New Default for Government Information,” mandates that, “the default state of new and modernized Government information resources shall be open and machine readable.”

So what does this mean for Federal agencies? Well, that’s a bit nebulous as of right now. But the Order from the White House does define three milestones for issuing further guidance and clarification, of which two have been met. The third was due to be complete August 7, but has not yet been released. To date, the OMB has released the Open Data Policy memo and the White House has published Project Open Data, an online repository of tools and implementation information.

What is not yet available is the Cross-Agency Priority (CAP) Goal due from the Chief Performance Officer. While the Open Data Policy memo and Project Open Data likely provide sufficient detail around which to effectively implement a program in keeping with the goals of the Executive Order on open data, it is the CAP Goal against which Federal agencies must report their progress.

Did I mention the first of those reports must be made by 180 days from the date of the Order? That means federal agencies must first report by November 5, 2013, with subsequent reports made quarterly.

What we do know is that the Open Data Policy memo defines five major requirements:

1. Collect or create information in a way that supports downstream information processing and dissemination activities
2. Build information systems to support interoperability and information accessibility
3. Strengthen data management and release practices
4. Strengthen measures to ensure that privacy and confidentiality are fully protected and that data is properly secured
5. Incorporate new interoperability and openness requirements into core agency processes

These top-level requirements shouldn’t be anything which Federal agencies aren’t already doing. Of course, each top-level requirement contains sub-requirements and those are where the rubber meets the road. Four of these sub-requirements in particular are likely to be new activities for Federal agencies, so are worth highlighting here:

The First requirement, 1a, mandates the use of “machine-readable and open formats.” This means that information must be stored in a format which can be easily read by a computer without loss of meaning (machine-readable) and that the format should be public rather than proprietary (open formats).

The Second requirement, 1c, calls for the use of open licenses for released data sets. Open licensing of released data ensures that the data can be used with no restrictions on how it is transmitted, adapted, or otherwise used for either commercial or non-commercial purposes.

The Third requirement, 3b, mandates agencies to establish and maintain a public data listing. Specifically, agencies must host a www.[agency].gov/data URL which provides a listing of the datasets which could be made publicly available. This listing must be available in both human-readable (i.e., html) format and machine-readable format, in order to allow and other aggregators to discover agency data sets. The maintenance clause here is key. The listing must be maintained and updated over time to comply with the requirement. Additionally, the listing should not include just those data sets which are available but also those which could be made available. That means if your agency is working with a data but has not fully cleaned and stripped out PII, it should still be listed at your /data URL.

The Final requirement, 3c, requires agencies to engage with customers (i.e., the public) to “facilitate and prioritize data release.” This builds on requirement 3b to add a mechanism by which the public can provide feedback to the agency in question to help set the priorities for release of additional datasets, influence the formats in which data is released, or otherwise shape agency data release processes.

Big changes are underway for the practice of data management in the Federal sector. More data release and more collaboration with customers can only serve to increase the rate of change and open up new areas in which the government is “by the people” and “for the people.”

In subsequent posts in this series we’ll delve further into the Open Data mandate and explore possible solutions for implementing its requirements in Drupal.

Barrett Sat, 08/10/2013 - 09:14

How to Determine Which Nodes Are Using Pathauto Paths

Submitted by Barrett on Sat, 11/17/2012 - 19:39
How to Determine Which Nodes Are Using Pathauto Paths

Recently at work, one of the site managers asked for a listing of which nodes on the site were using the auto-generated Pathauto paths and which were not. Should be easy, right? Just figure out where Pathauto stores whatever variable it uses to indicate if a node has the "Automatic alias" parameter set and dump the list. Turns out, actually no, it's not that simple. Pathauto doesn't store a variable. Instead, it has a set of logic checks that determine if the "Automatic alias" flag should be set or not each time a node is edited.

What you have to do, then, is load each node in question and compare the path it's using against the path which Pathauto would use if it were going to generate an alias. The pathauto_create_alias() function will tell you the later part, so long as you pass 'insert' as the second parameter instead of 'update'. (Otherwise, the function returns empty if it determines a new alias is not needed).

The full script I put together to satisfy the site manager's request was:

// make pathauto functions accessible
module_load_include('inc', 'pathauto');

// get published nodes from the database
$query = "select nid from {node} where status = 1";
$resultSet = db_query($query);

// load each node, compare path to what pathauto would generate and output
while ($nid = db_result($resultSet)) {

  $node = node_load($nid, NULL, TRUE);

  $placeholders = pathauto_get_placeholders('node', $node);
  $alias = pathauto_create_alias('node', 'insert', $placeholders, "node/$node->nid", $node->type, $node->language);

  // a simple boolean to indicate if the paths match, to make sorting/filtering of the results easy
  $aliasMatch = ($node->path == $alias? 1 : 0);

  // strip line breaks, tabs, and pipes from the titles cause they make a mess when we try to
  // open the output in Excel, then convert multiple spaces into a single space
  $cleanTitle = str_replace(array("\r\n", "\r", "\n", "\t", "|"), " ", $node->title);
  $cleanTitle = ereg_replace(" {2,}", ' ', $cleanTitle);

  echo "$node->nid|$node->type|$cleanTitle|$node->path|$alias|$aliasMatch" . PHP_EOL;

Barrett Sat, 11/17/2012 - 19:39


Migrating from CVS to Git

Submitted by Barrett on Sat, 11/03/2012 - 20:03
Migrating from CVS to Git

One of the initiatives begun since I came on board at USP is to convert the team from using CVS for version control to using Git. CVS was performing adequately in most respects, but the leadership recognized that Git is the de facto industry standard these days and that, beyond the technical benefits of moving to Git, there was value in keeping up with the standards of the industry and the Drupal community. The question, then, was how to best accomplish the change.

The first course we investigated was using git cvsimport. Our hope was that this would allow us to move our entire repository history into CVS. When we started testing it out, though, we found the results to be unclear and were not certain that everything was reflected in the resulting branches. Based on the uncertainty, we decided instead to do a more static conversion and leave the pre-Git commit history in CVS.

Because of the business needs we had to support, we decided on a branching strategy of using three long running branches; one for each of our development, testing, and production environments. Because there was no point in time when all three environments were in sync, each branch needed to be initialized independently. We also wanted to minimize the downtime, particularly on production. We decided to essentially wrap CVS in Git and that our process would be:

  1. Create an empty repository on the origin and add a .gitignore file to take care of the files we didn't need to track (like the CVS directories and the .# files CVS uses for tracking.
  2. Add files from the production web server by:
    1. running git init on the production server
    2. adding the repository created using git remote add
    3. checking out the master branch to pull down the .gitignore file we created
    4. adding and commiting all the files from production to the master branch of the repository
  3. Set up the dev and test branches by:
    1. moving all the files from the environment to another location
    2. cloning the repository onto the web server
    3. creating and checking out a new environment branch (i.e., dev on the dev server or test on the test server)
    4. moving the original files back onto the web server
    5. adding and committing the changes to the environment branch

The end result of this process was that we had a working Git repo and still have access to all the CVS history by running CVS on the same file set. All that with no downtime on production at all and only about thirty minutes of downtime on each of the dev and test environments (including a lot of running git status to double check that everything was as we expected).

Barrett Sat, 11/03/2012 - 20:03