Using DAOS Estimator The DAOS Estimator (daosest.exe) is a tool for planning the roll out of DAOS on the Domino 8.5 server. The tool iterates through all the requested databases scanning for documents with attachments. It keeps a list of all the attachments so that it can estimate savings based on duplicate attachments found within the database as well as across all databases. Install: To install the DAOS Estimator depends on the platform of your Domino server. It begins with downloading daosest for the correct platform. Then you need to put a copy of it in the Domino executable directory. The DAOS estimator is built as an SDK application so it may be run against any version of the Domino server. Next make sure that the permissions are set correctly. Run: You can either run the DAOS Estimator with the server up or down. To run it on the server you can type 'lo daosest ' or to run it with the server down go to a command prompt and cd to the Domino directory. Then type daosest -h to the get help screen as shown here: IBM DAOS Savings Estimator tool, Version 1.0 Copyright (c) IBM 2008. All rights reserved. daosest [OPTIONS] -h display this message -o output to file -v verbose, displays file information Note: Default input path is data directory. Note: See bottom of document for new features added in v1.4 The default is to run against the data directory as defined in the notes.ini. You may also run it against a sub directory or against individual nsf files. The verbose flag will output individual file information to the console or output file. Be aware this generates a lot of information, most of which is unnecessary. You can send the output to a file using the -o option. The advantage of using this feature versus piping the output to a file is that information is still sent to the console to let the user know what database is currently being analyzed. Also the file output is slightly wider, allowing for better readability. Output: The first section displays per database information. Database Name Orig NSF New NSF Num DAOS Dup Compr Space DAOS Ob Size Size Files Files Files Size Savings Size ============= ======== ======== ====== ====== ====== ======== ======== ======== Database Name – file name of the database. Orig NSF Size – current size of the database on disk. New NSF Size – estimated size of the database with attachments removed. Num Files – total number of attachments found in the database. DAOS Files – total number of attachments that are DAOS eligible. Dup Files – total duplicate files found in database. Compr Size – total compressed size of all attachments in database. Space Savings – total space savings from database which is the total size of all duplicate attachments. DAOS Ob Size – total size of all attachments in the database excluding duplicates. Note that all values are rounded to Kilobytes. The Orig NSF Size should be approximately equal to the DAOS Obj size plus the Space Savings plus the New NSF Size. Example Output: IBM DAOS Savings Estimator tool, Version 1.0 Copyright (c) IBM 2008. All rights reserved. Database Name Orig NSF New NSF Num DAOS Dup Compr Space DAOS Ob Size Size Files Files Files Size Savings Size ============= ======== ======== ====== ====== ====== ======== ======== ======== l\k######.nsf 3.1 GB 989.8 MB 6473 6473 2203 2.2 GB 424.7 MB 1.7 GB \k#######.nsf 4.4 GB 1.0 GB 5083 5083 1492 3.4 GB 879.6 MB 2.5 GB \k#######.nsf 136.5 MB 124.9 MB 87 87 12 11.6 MB 6.7 MB 4.9 MB mail\k###.nsf 1.3 GB 330.2 MB 2437 2437 719 987.8 MB 301.4 MB 686.4 MB mail\k###.nsf 550.0 MB 447.7 MB 824 824 210 102.3 MB 18.6 MB 83.8 MB il\k#####.nsf 660.8 MB 369.0 MB 1097 1097 250 291.8 MB 32.8 MB 259.0 MB l\l######.nsf 2.5 GB 1.3 GB 6096 6096 2391 1.2 GB 415.2 MB 781.7 MB ail\l####.nsf 2.3 GB 1.3 GB 2839 2839 877 976.3 MB 172.4 MB 804.0 MB \l#######.nsf 2.6 GB 2.0 GB 1592 1592 417 672.1 MB 159.4 MB 512.6 MB il\l#####.nsf 171.0 MB 148.5 MB 57 57 5 22.5 MB 1.3 MB 21.2 MB l\m######.nsf 1.5 GB 528.2 MB 2424 2424 848 1.0 GB 196.4 MB 848.1 MB \m#######.nsf 7.9 GB 2.3 GB 17736 17736 5266 5.7 GB 1.0 GB 4.6 GB \m#######.nsf 179.5 MB 140.6 MB 246 246 102 38.9 MB 12.9 MB 26.0 MB ail\m####.nsf 414.0 MB 211.6 MB 221 221 41 202.4 MB 35.4 MB 167.0 MB il\m#####.nsf 3.1 GB 763.7 MB 7749 7749 2845 2.3 GB 527.8 MB 1.8 GB \m#######.nsf 1.3 GB 520.8 MB 3919 3919 1616 792.5 MB 252.9 MB 539.6 MB l\m######.nsf 1.4 GB 793.1 MB 2678 2678 760 662.4 MB 157.0 MB 505.5 MB \M#######.nsf 3.6 GB 1.5 GB 8231 8231 2493 2.1 GB 374.9 MB 1.7 GB \m#######.nsf 375.5 MB 338.0 MB 849 849 283 37.5 MB 3.8 MB 33.7 MB ail\m####.nsf 6.6 GB 3.3 GB 11822 11822 4399 3.3 GB 974.0 MB 2.4 GB \m#######.nsf 116.5 MB 114.4 MB 16 16 9 2.1 MB 1.6 MB 0.4 KB l\M######.nsf 902.0 MB 663.9 MB 505 505 90 238.1 MB 47.3 MB 190.8 MB \m#######.nsf 1.5 GB 514.2 MB 2180 2180 558 1.0 GB 124.9 MB 905.6 MB il\m#####.nsf 9.9 GB 3.1 GB 15220 15220 4370 6.8 GB 1.4 GB 5.3 GB il\p#####.nsf 2.8 GB 718.7 MB 5235 5235 1875 2.1 GB 548.3 MB 1.5 GB il\p#####.nsf 3.9 GB 1.6 GB 12173 12173 3303 2.2 GB 508.8 MB 1.7 GB l\p######.nsf 4.1 GB 718.8 MB 4894 4894 1333 3.4 GB 685.4 MB 2.7 GB o########.nsf 2.1 MB 1.9 MB 45 45 0 0.3 KB 0.0 KB 0.3 KB \p#######.nsf 1.1 GB 862.8 MB 922 922 336 281.0 MB 59.9 MB 221.0 MB \r#######.nsf 1.5 GB 862.4 MB 1437 1437 241 660.1 MB 195.7 MB 464.3 MB l\r######.nsf 2.1 GB 1.1 GB 3462 3462 918 1.0 GB 226.6 MB 805.5 MB l\r######.nsf 515.3 MB 204.8 MB 891 891 266 310.4 MB 43.7 MB 266.8 MB ail\r####.nsf 8.8 GB 2.0 GB 79577 79577 20768 6.9 GB 2.1 GB 4.8 GB \r#######.nsf 4.8 GB 1.9 GB 14850 14850 6602 2.9 GB 1.1 GB 1.8 GB il\r#####.nsf 1.8 GB 903.4 MB 3956 3956 848 936.4 MB 128.1 MB 808.3 MB \r#######.nsf 5.4 GB 2.5 GB 11858 11858 3114 2.9 GB 791.9 MB 2.1 GB mail\r###.nsf 263.8 MB 243.5 MB 146 146 12 20.2 MB 1.9 MB 18.3 MB \r#######.nsf 17.8 GB 2.6 GB 99898 99898 30967 15.2 GB 4.7 GB 10.6 GB il\r#####.nsf 2.6 GB 844.5 MB 1902 1902 649 1.8 GB 444.2 MB 1.4 GB \s#######.nsf 3.2 GB 1004.2 M 6299 6299 2955 2.2 GB 681.7 MB 1.5 GB \s#######.nsf 3.2 GB 1.6 GB 7594 7594 3305 1.6 GB 532.4 MB 1.1 GB l\s######.nsf 3.9 GB 1.7 GB 5699 5699 1840 2.2 GB 527.5 MB 1.7 GB l\s######.nsf 3.8 GB 1.3 GB 3953 3953 1221 2.6 GB 476.7 MB 2.1 GB l\s######.nsf 8.4 GB 2.7 GB 1687 1687 297 5.7 GB 1.8 GB 3.9 GB ail\s####.nsf 3.8 GB 1.7 GB 6453 6453 1747 2.1 GB 388.1 MB 1.7 GB il\t#####.nsf 672.3 MB 219.1 MB 1756 1756 670 453.1 MB 133.5 MB 319.6 MB l\t######.nsf 4.0 GB 3.2 GB 1095 1095 242 764.4 MB 177.4 MB 587.0 MB mail\t###.nsf 2.6 GB 1.0 GB 3306 3306 1369 1.6 GB 556.5 MB 1.1 GB \t#######.nsf 2.4 GB 1.9 GB 3544 3544 1497 480.5 MB 134.9 MB 345.6 MB il\t#####.nsf 6.1 GB 3.2 GB 9243 9243 3131 2.9 GB 603.0 MB 2.3 GB \t#######.nsf 6.5 GB 2.6 GB 8155 8155 2801 3.9 GB 892.3 MB 3.0 GB il\t#####.nsf 3.5 GB 1.4 GB 1999 1999 611 2.0 GB 374.6 MB 1.6 GB il\t#####.nsf 2.1 GB 559.0 MB 4692 4692 1829 1.5 GB 282.7 MB 1.3 GB l\t######.nsf 4.8 GB 1.0 GB 3123 3123 832 3.8 GB 434.4 MB 3.4 GB ail\w####.nsf 2.6 GB 383.8 MB 2377 2377 894 2.3 GB 344.2 MB 1.9 GB \weis####.nsf 2.6 GB 567.5 MB 2277 2277 710 2.0 GB 512.9 MB 1.5 GB il\w#####.nsf 2.8 GB 514.5 MB 3956 3956 1261 2.3 GB 404.7 MB 1.9 GB il\w#####.nsf 5.8 GB 661.1 MB 7569 7569 2423 5.2 GB 1.4 GB 3.8 GB mail\y###.nsf 1.4 GB 780.5 MB 3452 3452 1473 676.5 MB 162.5 MB 514.0 MB ail\z####.nsf 11.8 MB 9.7 MB 8 8 0 2.0 MB 0.0 KB 2.0 MB For the first database: l\k######.nsf 3.1 GB 989.8 MB 6473 6473 2203 2.2 GB 424.7 MB 1.7 GB k######.nsf is 3.1 GB on disk. The approximate size after DAOS is enabled would be 989.8 MB. There are 6473 attachments in the database, all of which are eligible for DAOS. There are 2203 duplicate files representing 424.7 MB. The amount of disk space needed for the DAOS attachments is 1.7 GB. And 1.7 GB + (424.7 MB + 989.8 MB)/1024 is approximately 3.1 GB. The difference being the rounding that takes place in converting everything to KB and the rounding for display. Each database that the tool ran against is displayed. Remember, at this point, the DAOS savings is only due to duplicate attachments within the individual database, not across databases. The next section in the output is the Summary. It contains information across all databases against which the estimator was run. Example: Summary: Total DB's analyzed: 60 Total DB's skipped due to errors: 0 Total Size of NSF's Examined: 188.2 GB Total Attachments found: 429864 Total Duplicate Attachments found: 194499 Total DAOS Eligible Attachments: 429864 Estimated Size of DAOSified NSF's: 67.5 GB Estimate Size of DAOS dir: 90.8 GB Total Disk Savings: 38.8 GB Compression Statistics: None: 257877 Huffman: 150278 LZ1: 21704 Huffman on LZ1 servers: 0 For the above, a total of 60 databases were analyzed. Of those 60 databases, all were able to be opened and analyzed. This number is important to look at to determine how accurate the results are. For example, if half the databases could not be analyzed, then the results would potentially be way off. The total attachments found is a total number across all databases, including duplicates. There were 194,499 duplicates found across all the databases. The Estimated size of the DAOSified NSFs is the estimated size of all the databases that the tool was run against. The Estimated size of the DAOS dir is the estimated size of all the DAOS eligible attachments found excluding duplicates. The Total Disk Savings is the total amount of Disk Space save by eliminating duplicates across all databases. The compression stats are provided for informational purposes only. Histogram: The histogram is provided to give a quick graphical representation of the distribution of the attachment sizes across all the databases. ============================================================================== | Size Distribution of All Attachments Found | ============================================================================== | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416| | | |161416|90946 | | | |161416|90946 | | | |161416|90946 | | |66362 | |161416|90946 | | |66362 | |161416|90946 | | |66362 | |161416|90946 | | |66362 | |161416|90946 | | |66362 | |161416|90946 | | |66362 | |161416|90946 | | |66362 | |33629 |35311 |161416|90946 | | |66362 | |33629 |35311 |161416|90946 | | |66362 | |33629 |35311 |161416|90946 | | |66362 |19744 |33629 |35311 |161416|90946 |18550 | | |66362 |19744 |33629 |35311 |161416|90946 |18550 | | |66362 |19744 |33629 |35311 |161416|90946 |18550 | | |66362 |19744 |33629 |35311 |161416|90946 |18550 | 3458 | 426 | 22 | 0 | | | | | | | | | | | | | ============================================================================== | 0.0% | 0.1% | 0.3% | 0.6% | 7.5% |20.4% |31.5% |25.1% |11.4% | 3.2% | 0.0% | ============================================================================== | 4k | 8k | 16k | 32k | 64k | 1MB | 5MB | 20MB | 100MB| 1GB | >1GB | ============================================================================== - Histogram shows the number of attachments contained in each bucket. - Percentages are the percent of total disk space of all attachments per bucket. The numbers and height of the columns represent the number of attachments that fall into that bucket. So for the above data, there are 66,362 attachments that are between 0 and 4KB in size. The percentage below that column represents the percentage of disk space that those attachments are utilizing. Using this data can help you decide what the best DAOS minimum size would be for your environment. The idea would be to maximize the disk space savings while minimizing the number of files. The final section just reiterates the relationship between the Minimum DAOS size and its effect on the number of files and amount of disk space utilized. This will help to determine the optimum DAOS minimum size in your environment. DAOS Minimum Size versus number of NLO's and Disk Space: 0.0 KB will result in 429864 .nlo files using 120.7 GB 4.0 KB will result in 363502 .nlo files using 120.6 GB 8.0 KB will result in 343758 .nlo files using 120.5 GB 16.0 KB will result in 310129 .nlo files using 120.2 GB 32.0 KB will result in 274818 .nlo files using 119.4 GB 64.0 KB will result in 113402 .nlo files using 110.4 GB 1.0 MB will result in 22456 .nlo files using 85.8 GB 5.0 MB will result in 3906 .nlo files using 47.9 GB 20.0 MB will result in 448 .nlo files using 17.6 GB 100.0 MB will result in 22 .nlo files using 3.8 GB So for this server the optimum DAOS minimum size may be 64KB because it would include 99% of the disk space occupied by attachments (110.4 GB) while eliminating the need for 316,462 NLO files. DAOS Estimator Options -i The -i switch takes a filename which contains a list of databases to be analyzed. The file may contain absolute path's as well as path's relative to the data directory. Each name must be followed by a carriage return. For example create a file called files.ind that contains the location of databases: /local/notesdata/mail/UserA.nsf /local/notesdata/mail/UserB.nsf /local/notesdata/mail/UserC.nsf /local/notesdata/mail/UserD.nsf -c The -c switch causes the DAOS Estimator to write the attachment data out to a delimited text file to be analyzed later with the -a switch. Note that all information but duplicate attachment data is calculated and displayed in thus mode. Using this switch reduces the time to run on the server by as much as 65%. Results will vary. -a There are two ways to use the -a switch. 1.The -a switch takes the filename of a .csv file containing the attachment data which was generated from a previous run using the -c switch. 2.When a filename with the extension '.ind' is passed in, the DAOS Estimator assumes that the file contains a list of .csv files to be processed. This allows one to divide the databases into several smaller runs using the -i -c switches and then process them all together to get results across the whole set of databases. Also note that each file name must be followed by a carriage return. -p Estimate the estimate mode. The -p switch takes a percent value between 1 and 99 and uses this value to determine whether to run on each database or not. The default value is 50%. The DAOS Estimator will then run over all the databases specified analyzing a percentage of them. Then using the results of the run, it extrapolates the data out to the full set of databases. This mode is meant to speed up the DAOS Estimator in order to obtain an estimate in a much faster manner. This is useful for large data sets. How to select the “Minimum size of object before Domino will store in DAOS” using the DAOS Estimator output a. When considering the minimum participation size it is necessary to know the block size of the file system. Here is how to determine the block size: Platform Command Block Size reported as Window NTFS fsutil fsinfo ntfsinfo Bytes Per Cluster Solaris df -g Block Size AIX (need to be super user) lsfs -q Block Size Linux (need to be super user) df -k (determine device name) dumpe2fs | grep 'Block Size' Block Size About block size: The smaller the block size the less waste. Since it is unlikely that all the NLO files will be exact multiples of the block size, there will be some waste. To reduce waste, the minimum participation size needs to be a multiple of the file system block size. It should also be noted that smaller block sizes, which are beneficial for NLO files, are not beneficial for NSF files. With NSF files, the larger the block size the better the disk performance because NSF files are larger than NLO files. Consider creating a separate file system for the NLO files. b. Selecting the minimum participation size using the "DAOS Minimum Size versus number of NLOs and Disk Space" . The daosest reports a section with the number of NLO files and total NLO disk space that would be generated given minimum participation sizes of 0, 64KB, 128KB, 256KB, 512KB, 1MB, 2MB, 3MB, 4MB and 8MB. Here is an example of the section: 0.0 KB will result in 2226347 .nlo files using 185.5 GB 64.0 KB will result in 1092894 .nlo files using 175.7 GB 128.0 KB will result in 708403 .nlo files using 163.6 GB 256.0 KB will result in 422087 .nlo files using 145.9 GB 512.0 KB will result in 219833 .nlo files using 120.2 GB 1.0 MB will result in 93628 .nlo files using 87.8 GB 2.0 MB will result in 36576 .nlo files using 56.6 GB 3.0 MB will result in 17499 .nlo files using 38.0 GB 4.0 MB will result in 9717 .nlo files using 26.3 GB 8.0 MB will result in 1576 .nlo files using 6.5 GB The theoretical maximum would generate approximately 2.2MB files using 185GB. Reviewing the information, a value in the range of 128KB-256KB as the minimum participation size would be recommended. Between 128KB and 256KB, there should be approximately 500KB of NLO files, which would take up about 150GB of space. The result is a little less than a quarter of the number of NLO files that the theoretical maximum would require and the disk space would be 80% of the maximum total size. At 80% of the theoretical maximum benefit, with only 25% of the files there would be two additional benefits: 1) the disk backup would perform better and 2) the DAOS resync operation would be faster. DAOS resync has to enumerate all of the NLO files in the system as one of the steps, and the fewer files there are, the faster that part will run. Another consideration is the filesystem blocksize. Assuming a block size of 8K, with a random assortment of file sizes, on average, there will be waste at the rate of half a block size per file. At 64KB, there will be about 1MB files. The wasted space then works out to about 4GB (4KB * 1MB) assuming an 8KB blocksize. At 256KB, the wasted space would be approximately 1.7GB (4KB * 422KB). Again, fewer files is better because there is less wasted space. Lastly, if the yield is not as good as expected, it is much easier to tune the minimum participation size smaller than it is to tune it the other way (and clean up) if it yields too many NLO files.