Open data victory for Santa Clara County map data

via Bruce Joffe at the Open Data Consortium the California Appeals court upheld the Santa Clara County Superior Court’s decision to require Santa Clara to provide GIS parcel basemap data under the California Public Records Act, charging no more than the cost of duplication. While 41 other counties provided basemap data for $100 or less, Santa Clara county had atempted to charge over $150,000 for the data. This is a big victory for open government data.

Legal defense was provided by the California First Amendment Coalition whose writeup is here. The full court decision.

Journalists and bloggers – twitter for tips

One of the reasons the innovative political news blog Talking Points Memo is great is Josh Marshall’s practice of soliciting tips and research from the reading community. Since @joshtpm is now on Twitter, this could be a great new source for quick tips. One of the common and powerful uses of Twitter is to ask a question of one’s readers. It’s a quick way to learn from the collective knowledge of the community.

When TalkingPointsMemo solicits tips, they can put the twitter address among the ways to reach TPM. Journalists new to twitter should know that you don’t need to follow people to see their tips. All the TPM team would need to do is to go to Twitter search, enter @joshtpm and see mentions of that Twitter handle. TPM could even set up a public tip line by sharing a “hashtag” – a word that starts with a hashmark – like #tpmtips. Then they can use a persistent search to see all the tweets with #tpm tips. Of course this works only for public tips – but there is plenty of information about our political system that hides in plain sight – for example, what is a politician saying in his own district.

The fact that WSJ reporterJulia Angwin sees Twitter as primarily a broadcast medium – even though she learned about it by following colleagues – suggests that using Twitter as a means of learning from your audience hasn’t yet gotten good adoption yet in the journalist community.

Stupid WordPress tricks – page sort order

WordPress lets you set up a website that looks more like a site than a blog. You can create “pages”, and compose hierarchical site navigation by putting a list of pages in a sidebar or header. There are any number of pre-packaged themes that give you multi-column templates, and you can use widgets to put navigation and other content into the sidebars.

So, how do you put the pages in the order you want? The sidebar widget lets you choose the sort order – alphabetical, or something called “page order.” What the heck is that? After some “I feel dumb” searching, I looked at a page. There is a small section to the left of the page that lets you add a number which sets the sort order. So if you want your navigation pages to be sorted 1-5, you go into each page, and put the right number for each page. Kinda clunky but works

Since it took me some extra time to figure this out, I wrote this post to help others.

Social media page rank

Back in the day, Google’s pagerank used many individual acts of linking to calculate the relevance of pages. These days, the acts of linking are occurring in near-real time and viral waves on Twitter and social network services. Links in social media are a powerful indicator of relevance. A link has been retweeted, friendfeeded, bookmarked and facebooked. So there could be a “social network page rank” algorithm that calculated the relevance of links, and a “network digg” that showed the heat of a meme.

One problem is that twitter’s 140 character limit encourages the use of url-shortening services like and tinyurl, which obscure the destination and content of the link. So the service would need to expand and compare the urls, and do a little analysis to figure out when slightly variant urls link to the same content.

A secondary problem is potential spam, but a social white list – in Clay Shirky’s geek-felicitous term, foaf-filtering – could mitigate that. Social self-promotion (I’ll retweet yours if you retweet mine) could be a problem, but I suspect the echo chamber effects would be fairly localized for garden variety topics, and the pop culture or political fangames would be interesting in their own right.

Does this exist yet? Urls welcome.

Update 1. via Chris Messina, Backtweets are the new technorati.

Update 2. John Battelle The Conversation is Shifting on the trend toward social search. I don’t think it stops at neophilia.

Sidney software dev faces lawsuit for iPhone app

ZDNet Australia reports that a software developer in Sidney, Alvin Singh, is being threatened with a lawsuit for writing an iPhone app that is the second most popular item in Australia’s iPhone app store. Rail Corporation of New Zealand, the government body that administers the railroad, charges the developer with violating copyright, but offers no way to authorize developers to access the schedule data.

TransparencyCamp was all about getting social benefit from publishing and reusing government data. Bay Area TransitCamp was about ways of creating tools and services with transit information to improve transit service. There’s a movement around providing more public access to government data so citizens like Alvin Singh can provide useful services

Developers who write useful applications with government data should get awards not lawsuits.

Transparency 1 vs. Transparency 2

Transparency Camp revealed the contrast between old and new models of protecting the public’s right to know about our government.

At the same time as Transparency Camp, David Simon, an old beat reporter in Baltimore, wrote a piece in the Washington post about the good old days of crime beat reporting. Armed with a knowledge of public information law and a relationship with a pro-first-amendment judge, and motivated by his role as the representative of the public’s right to know, Simon wouldn’t take recalcitrant cops’ excuses as an answer, and relentlessly pursued the truth about crime and police activity. In the article, Simon laments the demise of beat reporting. There just aren’t reporters on the street covering a topic and pursuing the truth. Even the current judge in the district doesn’t have an interest in enforcing public information access, as Simon found recently when he tried to find information about a police shooting.

Meanwhile, over at Transparency Camp, one of the attendees was Brian Sobel the developer of the Are you Safe iPhone application that shows location-based crime information for blocks in Washington, DC. Information about crime isn’t published because one intrepid reporter made the cop turn over the crime report, but because the database of crime stats is online.

Just because there is data about a crime doesn’t mean the data is accurate or that justice is being served. In Baltimore there were no journalists or bloggers investigating the police shooting of an unarmed 61-year-old man in February, until the retired journalist starting making calls. What’s needed is not only mapping but community input, like the everyday activism on Uncivil Servants which captures reports of illegal parking by New York city employees. And like the crowdsourced journalism managed by Amanda Michel who is taking her experience with citizen journalist campaign coverage to ProPublica. Her first assignment as Editor of Distributed Reporting is to get many eyes to cover the implementation of the stimulus bill.

In David Simon’s world, a few brave reporters had the special knowledge and connections to get enforcement of open data and open records. In our world, the government policy needs to make data available as a matter of course, and crowdsourcing tools and communities need to give more people the knowledge and the courage that David Simon had to demand accurate information from the cops.

The world is different. Open data and crowdsourcing give more people the raw information and open government literacy that David Simon had. But we need the organizational structures, funding, and motivation to use them. There’s no guarantee how well the new way will work, but there are tremendous opportunities, and it’s up to us to make them work.

Why I’ve been private on Twitter

I just decloaked @alevin for the Socialtext Signals launch, with ambivalence. There are two main reasons I was private. I tweet about trivia sometimes — farmers market visits, workouts, misadventures with car repair. I know that’s supposed to be sympathetic and human and “good for the personal brand” but darn it, I’m an introvert. I feel much more comfortable sharing trivia with people who know me, even a little, and who have chosen to pay attention. And I hate spammers and MLM scum with a passion. I hate to think they get whuffie by following me.

But public tweeting makes it easier to participate in the public conversation, so here goes.

Scale effects in enterprise social software

When people think about social software, they think big. Thomas Vander Wal’s slide on tagging is a good example. At the bottom of the scale is personal use, and the largest scale is shown as a “mature system.” (see slide 25 below).

Organizations come in various sizes and shapes, from small businesses and workgroups to very large enterprises. Even large enterprises with tens of thousands of employees are tiny compared to the scale of public social software services such as Delicious (5 million users a year ago), and SlideShare (a million users in December)

The tools and properties of social software act differently at different sizes and scales. Bigger can be better, worse, or just different.


At a large scale, with very large numbers of users and assets, tagging reveals the knowledge of crowds. A tag cloud in a large, active community reveals clusters of interests and enables discovery as a byproduct of many tiny acts of personal organization.

At a small scale, tagging doesn’t have these same effects. There isn’t enough distribution to discover interesting things about the topic space, and there isn’t enough density to make it a really efficient browsing tool on its own (although it does help with search).

On the other hand, smaller-scale tagging is useful. At a small scale, with good processes, tagging can be used to create workflows and update feeds that are useful and also adaptible.

Ranking and Rating

At the level of Amazon and NetFlix, ratings from a large base of participants quickly reveal the hits and the dogs. Rating is a quick action that lets casual participants contribute quickly and get something back in the form of recommendations to match their tastes. In a good-sized knowledgebase, rating is valuable to indicate which content is service its purpose and what needs improvement.

At a smaller scale, ratings are a very different tool than at larger scale. Ratings can be an excellent decision support tool to assess the popularity and priority of ideas in a finite period of time. Ideas are generated and fleshed out. Then, ranking and voting are used to prioritize the ideas.

There are also ways that ratings can be counterproductive at a small scale. In a discussion community where there are typically a handful of contributors or commenters per topic, having one or two ratings on each post is a meaningless waste of space (see an example of this anti-pattern on the Personal Democracy Forum, an otherwise excellent blog.)In communities where participation is high, asking for “ranking” – a low-engagement activity – can serve the purpose of reducing contribution. If you see something that you have a chance to make better, why make it easier to point out that it is broken than to go ahead and fix it?


Participation is an area where smaller communities have very different dynamics than large ones. In consumer communities, there are typically a very small number of very active contributors. The vast majority of people are consumers, and only a small percent do any amount of contributing. In a large community, that small number results in vast creativity, but it’s still a small percent of the whole. By contrast, healthy collaborative intranets have much higher active participation, often in the low tens to 50 percent or more.


Large social systems have significant challenges with social misbehavior. Large crowds where people don’t know each other bring out anti-social behavior (trolling, spam, just plain idiocy). So large communities need to implement explicit reputation systems to reduce the noise caused by anti-social participants.

In smaller communities, where people use their real names and have real-world accountability for their actions, misbehavior is exceedingly rare. Facilitation is helpful to foster productive interaction and help a group head in a common direction, but explicit reputation isn’t needed. More implicit reputation revealed by contributions and participation can be interesting and relevant.


There are two social software principles that are at odds with each other:

  • social software gains value as more people use it.
  • social software is that which can be spammed

On the one hand, large scale networks can yield valuable insights, where large numbers of people tag, rate, and link information and each other. On the other hand smaller groups have higher participation, collaboration, and civility.

Administrators and champions who foster social networks should think about the scale of their community, and use tools and techniques appropriate for the scale.