Tuesday 20 October 2009

SharePoint Advanced Search Properties Don't Work - the skinny on Created By, Modified By, Author and more

If you've ever done much work with the Advanced Search web part it won't take you long until you discover that most of the default property searches simply don't work. As of writing this post there were dozens of very long forum threads describing the problem with no comprehensive fix in sight by Microsoft or anyone else. Perhaps until now...?

What's not Broken?

Despite the claims of many, the Size and (Created/Modified)Date properties are not completely broken. They simply require the correct format for input. It's also worth noting that the explicit value is required when using property searches. No wildcards or partials!
  • Size - takes a value in bytes, so that 1000000 = 1Mb.
  • Dates - require a xx/xx/xxx or xx/xx/xxxx format. The order of the days/months will depend on your regional settings. If you're still getting no results, then check the metadata mappings and XSLT references below.
Although uncovering the correct format was pain, it was nothing compared with what followed.

What is Broken and Why?

As many have discovered, the Created By, Last Modified By, Created Date, Last Modified Date and Author properties are all affected to some degree.
The root cause for all this is down to improper mapping of crawled properties to their managed properties within Search Administration and keeps going all the way through to the XSLT for the Advanced Search and Search Core Results web parts.
The reasons behind these poor relationships become obvious shortly after you begin looking for a solution. To be frank, it also becomes obvious why most people gave up trying!

Thanks to...

Much thanks goes Anne Stenberg and her 6-part series entitled - Mystery Solved - Crawled Properties in SharePoint.
In this series Anne patently and painstakingly goes through every last property in each defined category, and providing a description for many. I'm not entirely sure where she came by all this information but it proved invaluable when it came to identifying and testing the result of many changes to come in my metadata property mapping.

Please explain!

Using a combination of Anne's tables, the U2U CAML Query Builder feature, the ever useful SharePoint Manager, and the XSLT within the search web parts - it quickly becomes obvious that it's going to take more than a packet of off-the-shelf headache tablets to get through this.
Without going into too much detail - ignorance being bliss - let's take a look at something as simple as Author.
  • We have a visible Author column whose internal name is _Author.
  • A hidden Created By column whose internal name is Author.
  • And a managed property called Author that seems to want to hedge its bets by trying to cover all these bases as well as a few more.
But that that's just the beginning - Created By and Modified By searches will invariably return zero results and also have their fair share of possible mappings and hidden values. What the heck is Write anyway?? Apparently just another value for Modified Date...but more on that later. I'm sure anyone's who's that interested can do their own research. I won't bore everyone alse any further.

What's the fix already!?

OK, OK. Keep your propeller hat on.
After days of stuffing around, tweaking mappings, modifying web part properties and performing a full crawl each time(!) I have finally found - I think - a solution. At least, a number of searches - using Author, Created By and Last Modified By properties with the AND operator - all returned correct results.
It's also worth noting that this solution is not Office-centric and will work with any document type.

First, the Metadata

This assumes a good knowledge of Central Administration. If you require detailed steps they can be fond elsewhere.
You can use all or some of the settings shown below but the only ones that really matter are the Mappings themselves. After you've added the crawled properties, be sure to click each one and check the "Include values for this property in the search index" checkbox, otherwise it won't get added to the index! In all cases I went with the default "Inlcude values from all crawled properties mapped" option.
Also note that there are often TWO properties with exactly the same name - e.g. Office:4(Text). Picking the right one is essential and I have provided the Property Set IDs below where this is relevant.
And, remember, what follows is in no way Gospel - it's just what worked for me.
NB: Don't forget to run a Full Crawl after making these changes.
Property Name
Type
May be deleted
Use in scopes
Mappings
AuthorTextNoYes_Author(Text), ows__Author(Text)
CreatedDate and TimeYesNoOffice:12(Date and Time), Basic:15(Date and Time)
CreatedByTextYesNoOffice:4(Text), ows_Created_x0020_By(Text
LastModifiedTimeDate and TimeNoYesBasic:14(Date and Time), Basic:16(Date and Time), ows_Modified(Date and Time)
ModifiedByTextYesYesOffice:8(Text)
  • Office:12(Date and Time) - f29f85e0-4ff9-1068-ab91-08002b27b3d9
  • Basic:15(Date and Time) - b725f130-47ef-101a-a5f1-02608c9eebac
  • Office:4(Text) - f29f85e0-4ff9-1068-ab91-08002b27b3d9
  • Office:8(Text) - f29f85e0-4ff9-1068-ab91-08002b27b3d9

Advanced Search XSLT

The following go in PropertyDefs. There are many default values here, I'm just providing the full block. You'll then need to add the same 'Name' references to each ResultType in the order you prefer.
<propertydef name="Author" datatype="text" displayname="Author">
<propertydef name="Size" datatype="integer" displayname="Size">
<propertydef name="Keywords" datatype="text" displayname="Keywords">
<propertydef name="CreatedBy" datatype="text" displayname="Created By">
<propertydef name="Created" datatype="datetime" displayname="Created Date">
<propertydef name="ModifiedBy" datatype="text" displayname="Last Modified By">
<propertydef name="LastModifiedTime" datatype="datetime" displayname="Last Modified Date">

Search Core Results XSLT

Unless you're trying to provide custom results using some of the values described above you won't need to make any changes here. Quite frankly it's a little daunting but great things can be done - such as displaying Size, Author and a custom link to open the containing folder for each result. I'll probably leave this for another post as it's a topic in itself.

In conclusion...

So, hopefully, if you've done everything right and performed a full crawl, you should now be able to search using one or all of the properties we've discussed here.
One thing you may find still doesn't work is the "Does not equal" operator. You might also that it's not described anywhere in the web part code but is managed by a separate core JavaScript file. I'm just not willing to look into this right now - or the reasons why "Contains" and "Doesn't contain" aren't available for partial search term querying. If anyone else has any ideas - performance notwithstanding - feel free to drop me a line.
I look forward to any further insight and feedback others might have and hope that all my hard work isn't undone with the next upgrade!

Thursday 16 July 2009

Deleting unused SharePoint Content Types

It happens to every site collection admin at some stage. For whatever reason you're required to replace a defunct content type in your document library with a new one. Adding t he new one is easy. But then you try to delete the old one and receive the terrifically informative "Content Type is still in use" error.

You've tried everything:

  • Updated any files you could find with the new content type.
  • Checked every file twice to make sure you didn't miss any.
  • Managed Checked Out files to locate all those tricky 'hidden' docs that haven't been checked in yet and only exist in some funky temp/draft state! (You'll either need to take ownership of these as site collection admin, or email a list to the original authors to get them to check them in or delete then. I can't believe there is no option to delete these en masse!)
  • Run custom CAML queries to make REALLY sure you didn't miss any (using the fabulous U2U SahrePoint CAML Query feature - www.u2u.be/res/Software.aspx.
  • Written a console app just to make ABSOLUTELY sure you didn't miss any.
  • Emptied the recycling bin - both of them!

But still the error persists.

Just when I thought I'd exhausted all options it occurred to me that perhaps versions were the culprit. Looking at the version history for a few suspect documents confirmed that they were.

Running the following SQL query will find them all.

SELECT *
FROM AllUserData
WHERE (tp_DirName LIKE '%sitename/Shared Documents%')
AND ((tp_ContentType = 'myDodgyCT'))
ORDER BY tp_DirName

This should be self-explanatory but tp_dirName is just using a relative path from the domain to the library. And ContentType is the explicit name of the content type to search on.

From there I just exported the results to Excel, filtered out duplicates and was left with a workable list of documents. Simply publish a final version of each (if required), then go to the Version History and delete all previous versions.

You could do all this in one step by turning off versioning for the library and then turning it on again. But this would delete versions for ALL documents. A good trick to remember when you're site quota is reached.

Good luck!

Thursday 18 June 2009

Import Publishing Sub Site as Site Collection

Any farm administrator will eventually feel the need to restructure when they watch in horrid fascination as their once insignficant sub sites grow to monster sites. Current MOSS limitations such as the 15GB limit for the stsadm import/export operations mean that unless you're using a terrifically expesnive 3rd party migration app you need to act fast before things get out of hand

The obvious solution would be to export the site(s) in question and then reimport them to an empty site collection. Yes, that would seem obvious, but it's not quite as straightforward as you might assume. What really compounds the matter is if you happen to have enabled the publishing feature on your source site.

After many days of disappointment and incoherent error messages I finally returned to Gary Lapointe's brilliant blog - SharePoint Automation. Gary has developed dozens of custom stsadm commands in order to fill the cavernous and aching void left by the default offerings. One thread in particular, Subsite to Site Collection, deals with this pesky issue and also includes his own hair-pulling frustration with the out-of-the-box limitations to achieve this. BTW, the solutions he provides are not strictly supported - but they work where nothing else will and are very easy to use.

In my case I simply deleted every failed test and started from scracth using the gl-convertsubsitetositecollection operation. All I had to do was provide the source site and destination URL and the command did the rest.

stsadm -o gl-convertsubsitetositecollection -sourceurl http://mydomain.com/source -targeturl http://mydomain/destination -nofilecompression -owneremail me@home.com -ownerlogin "mydomain\me" -nositetemplate 

This exports the source site, creates the destination site collection and site (based on source site template), and activates any features. The source site was a Team Site with publsihing and several custom features enabled. The only thing I did was create the managed path but this command even has a flag for that!

The only thing it doesn't appear to do (yet) is allow you to specify a new content database. I've already put my request in for this and hope to see it soon. If this helps someone, please thank Gary - not me! :)

Monday 15 June 2009

Renaming SharePoint Site Collections (the Secret of Managed Paths)

Everyone knows, or at least should know, that SharPoint sites (formally referred to as Webs) can be easily renamed with either stsadm -o renameweb or through the GUI.

But what about renaming site collections?

Well, the short answer to that question is no. Site collections cannot be renamed, at least not those on managed paths.

e.g. domain1.com/sites/sitecollection_name

Hosted site collections using host headers can be renamed using the stsadm -o renamesite command but it is limited to hosted sites only.

e.g. domain1.com -> domain2.com

Yes, another massive limitation of the current SharePoint version. But the real culprit - and what makes this all the more confounding - is managed paths. If you've never had any real experience with these little beasties then I envy you. But get ready to put your hard hat on because we're going in.

You can create a managed path for a site collection using one of two options - "Explicit exclusion" or "Wildcard inclusion" - but not both! The limitations of this will become obvious soon.

Let's say you want the following structure for your site collections:

domain1.com/groups domain1.com/groups/team1

- where both groups and team 1 are site collections.

This cannot be achieved conventionally. The reason is because to create a site collection at the managed path of /groups it must be set to "Explicit exclusion" - meaning you cannot create site collections below this path.

And if you create the /groups path to use "Wildcard inclusion" then you can only create sites below this path. Make sense?

i.e. domain1/groups/my_sitecollection

This means that site collections can have no distinct hierarchy. Not that this matters given that they are treated as separate entities and current version web parts are unable to query across site collections anwyay!

So is there a workaround to both these limitations? I'm glad you asked. ;)

The following refers to explicitly renaming a single site collection which uses "Explicit exclusion" only. Depending on your needs you can use "Wildcard inclusion" for ensuing site collections.

To rename a site collection you need to:

  1. Backup your exsiting site collection using stsadm -o backup.
  2. Create a managed path of newpath "Explicit eclusion" using the new name you want.
  3. Restore the site collection to the new URL using stsadm -o restore.

To create a virtual URL hierarchy you can:

  1. Create a managed path of newpath/newsite "Explicit exclusion".
  2. Create a new site collection using this path.

Yes, it's true! MOSS will allow you to create a managed path in this form, despite the explicit exclusion on the original /newpath path.

Great isn't it? Now, whether this is intended or flawed behaviour, or a good idea or not is another matter entirely. ;)

Either way - good luck! And may version 14 resolve these and many other issues we're plagued with.