The Search tools in Vault and Productstream 2009 have changed significantly from 2008. These changes were made to provide enhanced capabilities in both flexibility and performance. Searches that previously would run for many minutes to complete (in some cases did not complete at all) now complete in seconds! Also, the flexible new user interface provides a dramatic improvement in usability. These topics and more are discussed here.
Performance
Significant architectural changes were made by separating the search indexing from the main database. This change impacts the administrator and end-user workflows in many ways. Tasks like search indexing used to be both automatic and immediate. Search indexing is still automatic but it is no longer immediate. After a file is added to the system it could be a couple of minutes before the file appears in a search. For most customers this will complete in minutes, in large environments it might take as long as a couple of hours. Hopefully this one time inconvenience is a fair trade for the performance gains.
Raw speed - We have plenty of it! As an example one customer data set with over four hundred thousand files in Productstream 2008 experienced a basic search time of over 280 seconds. With Vault 2009 this was reduced to less than 15 seconds. Another customer data set took over 80 seconds to display the Item master with Productstream 2008. With Productstream 2009 the Item master displays in less than 5 seconds.
Paging
In the 2008 products we have a performance loop hole that would allow an individual user to tie up the server and force everyone else to wait. If a user performed an unbounded search the server is consumed with that search until it completes. By blocking this shortcoming Paging provides consistent performance for the environment as a whole. All users receive an equally prompt response from the server at all times.
With Paging each user receives a page of results when performing a search, viewing the Item or Change Order list. The size of the page is set by the administrator. The default page size is 100 records. The page model used here is NOT like a web page where the user moves forward and backward. Instead this page model is compounding, typically referred to as a ‘More’ model. This allows a user to perform a search, receive the first 100 results (using default values for the example.) The user then selects the More button which will result in the second page being appended to the first. At this point the user is viewing 200 records. The user may continue to select the More button until it no longer appears. The absence of the More button indicates that all records are visible. Usability has shown that most users refine the search if their goal was not found on the first page.
Tokens
The new search system indexes Tokens in place of whole property values. This means that all property values are separated into individual searchable chunks known as tokens. Tokens offers some powerful capabilities but to use it effectively does require a basic understanding. We will start with the explanation of how property values are broken into tokens. The following grid shows file names as an example. On the right side of the grid you can see how each file name is breaks down in to tokens.
File Name
|
|
Tokens
|
|||||
A-055401-321.ipt
|
|
A
|
-
|
055401
|
-
|
321
|
Ipt
|
Great White Shark.doc
|
|
Great
|
White
|
Shark
|
Doc
|
|
|
Gr8work.xls
|
|
Gr
|
8
|
Work
|
Xls
|
|
|
The rules for converting to tokens are as follows.
All adjacent characters of like type are grouped into a single token:
Alphabetic (A, B, C, …Z)
Numeric (0, 1, 2,…9)
Special punctuation (-, _,@,…$)
Some special characters are recognized as searchable objects. All other punctuation and special characters are not searchable and therefore are not contained in the tokens. In the previous example, notice that all file extensions are a separate token because they are separated by a '.' even though the '.' is not a searchable character. The following list contains all the special characters that are searchable. Any special character not on this list is not searchable.
Dollar
|
$
|
Dash
|
-
|
Underscore
|
_
|
Symbol
|
@
|
Plus
|
+
|
Pound
|
#
|
Searching
Using our new understanding of how property values break down to tokens we can be smart about how we find our data. We will use the following sample set to explore some examples.
File Name
|
Vendor
|
Material
|
Author
|
Design00.idw
|
Fabricated
|
Alum
|
Des
|
Design01.dwg
|
Fabricated
|
Alum
|
Des
|
Design02.idw
|
Fabricated
|
Alum
|
Al
|
Design001.idw
|
Fabricated
|
Alum
|
Al
|
Design023.prt
|
Fabricated
|
ABS
|
Al
|
Design054.prt
|
Fabricated
|
ABS
|
Ed
|
Design055.dwg
|
Fabricated
|
ABS
|
Ed
|
Part01.ipt
|
Autodesk
|
Titanium
|
|
Part02.ipt
|
Autodesk
|
Titanium
|
The obvious advantage is the ability to search for a token independently. We are not limited to searching for individual tokens, but it is an option.This change provides the ability to search for values that were previously only available as part of a larger search result.
EXAMPLE 01
Basic Search: 'Des'
Result in 2008:
Design00
Design01
Design02
Design001
Design023
Design054
Design055
Des (Design00.idw) - Finds both Author and File name
Des (Design01.dwg) - Finds both Author and File name
Autodesk (Part01.ipt)
Autodesk (Part02.ipt)
Result in 2009:
Des (Design00.idw) – Hits only on Author
Des (Design01.dwg) – Hits only on Author
The precise search results of 2009 are achieved by NOT assuming wild cards. In 2008 a star ‘*’ wildcard was automatically appended before and after the search value.
EXAMPLE 02
Basic Search for: '0?'
Result in 2008:
Design00
Design01
Design02
Design001
Design023
Design054
Design055
Result in 2009:
Design00
Design01
Design02
If you are familiar with the wildcard ‘?’ you know that it is typically a wildcard for a single character. This is correctly represented in 2009 but in 2008 it is not. The reason is that in 2008 all basic searches are automatically enclosed with wildcards. The resulting search contains the ‘*’ on both the front and back end which results in a search that looks like ‘*0?*’. The result of this basic search in 2008 for ‘0?’ is effectively no different from a basic search for ‘0’.
Another change between 2008 and 2009 is the behavior of the search condition ‘Contains.’ Previously in 2008 a search of type 'Contains' was automatically executed with both leading and trailing wildcards. The result is that a search of type 'Contains' in 2009 supports trailing wildcards.
To explain how wildcards are applied to tokens we have created the following diagram. The important aspect to understand is that the wild card only applies to one token. As shown below the wild card does NOT span across multiple tokens.
In this example, we are searching for the string: Gr*k
Strings that are found
|
|
Grok
|
“Grok”
|
greek1
|
“greek”, “1”
|
GreenEggsAndPork
|
“GreenEggsAndPork”
|
GRK-000003$
|
“GRK”, “-“, “000003”,”$”
|
Strings that are NOT found
|
|
Great White Shark
|
“Great”, “White”, “Shark”
|
gr8work
|
“gr”, “8”, “work”
|
Grand_Trunk Railroad
|
“Grand”, “_”, “Trunk”, “Railroad”
|
GR-000001-PK-5
|
“GR”, “-“, “000001”, “-“, “PK”, “5”
|
Conclusion
Searching with Vault and Productstream 2009 is both faster and more precise than previous releases. This improvement is achieved through the introduction of token based search index and paging. Understanding how to utilize the benefits of a token based search system provides lightning fast searching with surgical precision.
Thanks Ross!
Comments