Ephrus Integratin Kit Authr: Rbin Hildebrand Versin: 2.0 Date: May 9, 2007
Histry Versin Authr Cmment v1.1 Remc Verhef Created. v1.2 Rbin Hildebrand Single Sign On (Remved v1.7). v1.3 Rbin Hildebrand Reprting service extended v1.4 Rbin Hildebrand HandInService extended with PrcessType. v1.5 Rbin Hildebrand Reprting service versin added with student inf in result and including result f riginal in case f duplicate dcument. v1.6 Rbin Hildebrand Web service added fr shwing/hiding dcuments frm the Ephrus index. v1.7 Rbin Hildebrand Web service added fr retrieving riginal dcuments. v1.8 Rbin Hildebrand Reprting service versin added with sending all the prfile reprts and add extra infrmatin when duplicate. v1.9 Crnelis Richter Intrductin added. Hand-in service clarified. v2.0 Rbin Hildebrand Web service descriptin fr retrieving riginal dcuments extended. 2
Table f cntents Histry... 2 Table f cntents... 3 1. Intrductin... 4 2. Abbreviatins... 5 3. System Architecture... 6 4. Preparatin & hand-in cdes... 7 5. HandInService... 8 5.1. Ntes... 8 5.2. UpladDcument... 8 5.2.1. Request... 8 5.2.2. Respnse... 8 5.2.3. Supprted file types... 8 6. Ephrus Reprting Service v1... 9 6.1. Authenticatin... 10 6.2. Encryptin... 10 7. Ephrus Reprting Service v2... 11 7.1. Authenticatin... 12 7.2. Encryptin... 12 8. Ephrus Reprting Service v3... 13 8.1. Authenticatin... 14 8.2. Encryptin... 14 9. IndexDcumentService... 15 9.1. Ntes... 15 9.2. IndexDcument... 15 9.2.1. Request... 15 9.2.2. Respnse... 15 10. GetOriginalDcumentService... 16 10.1. Ntes... 16 10.2. GetOriginalDcument... 16 10.2.1. Request... 16 10.2.2. Respnse... 16 11. Example dcument cmparisn... 17 12. Appendix... 18 12.1. Web Service Descriptin Language files... 18 12.2. Sample dcument cmparisn xml... 18 12.3. Sample dcument cmparisn xslt stylesheet... 18 12.4. Sample summary xslt stylesheet... 18 12.5. Sample Webservice HandInDcument... 18 12.6. Sample Webservice EphrusResults... 18 3
1. Intrductin System Integratrs can extend their applicatin with plagiarism reprting functinality using Ephrus Cnnected. With this service they can help their custmers t prevent plagiarism. Plagiarism ccurs when a student cpies wrk frm anther student r the Internet and presents it as its wn. Ephrus Cnnected is based n the SOAP prtcl, ensuring that it can be used within any prgramming language and platfrm. This kit cntains infrmatin abut the implementatin f Ephrus Cnnected. This makes it pssible t uplad dcuments, cmpare the dcuments and retrieve the results frm any applicatin. Ephrus develped several web services in rder t prvide full functinality f the Ephrus. The Hand-in Service can be used t send dcuments t Ephrus. In rder t be able t identify the sender yu shuld use a hand-in cde, which is prvided by Ephrus. When a dcument is successfully sent t Ephrus the sender will be prvided with a unique ID (guid) f the dcument, generated by Ephrus. The Ephrus Reprting Service handles the delivery f the reprts. Ephrus will prcess the dcument and when dne a reprt will be delivered at a pre knwn lcatin fr each rganizatin. The rganizatin must have a web service, which is public available. If necessary this can be prtected by a firewall, with nly the Ephrus IP addresses allwed. Ephrus can send three different types f reprts (v1, v2 and v3). The Ephrus Reprting Service v2 is an extensin f v1 and v3 is an extensin f v2. In v2 sme extra infrmatin is sent abut the wner f the dcument, which is useful when the dcument handed-in is a duplicate dcument (if the upladed dcument is a duplicate the results f the riginal dcument will be send). In v3 three types f reprts are sent: a strict, standard and cmpliant versin. We recmmend yu use the latest versin f the Ephrus Reprting Service. Nte: it's pssible that Ephrus sends ut reprts mre then nce. Since December 27, 2006 Ephrus custmers can make use f database pls. If a dcument cntains similarities with a dcument frm anther rganizatin that is a member f the same database pl, Ephrus will reprt this, but - fr privacy reasns - detailed infrmatin n this surce will nt be reprted. What des this mean fr the Ephrus Respnse? The ndes URL, riginal_guid, student_name and student_number will be empty. With the Index Dcument Service yu can add r remve a dcument frm the index f yur database. This is very useful when yu want the student t revise his/her wrk. Since yu d nt want t reprt pssible plagiarism between the different versins f the same dcument yu can remve the first versin f the dcument frm the index befre yu hand-in the new versin. The Get Original Dcument Service enables yu t request a dcument frm yur Ephrus database. Since yu stre yur dcuments yurself yu will nt always have t use this web service t access a dcument. Hwever, this web service might be useful when a client has been using Ephrus fr sme time via ur web applicatin befre using it in an integrated manner. Imagine that pssible plagiarism can be reprted between a new dcument and a dcument that was nt upladed via yur applicatin. Then yu d nt have the dcument we reprt as a surce dcument. In the fllwing chapters the web services will be discussed in detail. 4
2. Abbreviatins HTTPS HyperText Transfer Prtcl Secure IP Internet Prtcl SOAP Simple Object Access Prtcl URL Unifrm Resurce Lcatr XML Extensible Markup Language XSL Extensible Style sheet Language 5
3. System Architecture The image belw shws the system architecture. The custmer zne cntains the e-learning system. This is the system the students and teacher cnnect t. The Ephrus zne cntains the Ephrus Web servers, Ephrus Detectin Service and the Ephrus Reprting Service. The arrws shw the cmmunicatin between the Learning System and Ephrus. 6
4. Preparatin & hand-in cdes Each custmer must be registered with Ephrus. The custmer needs a cde t be able t access the Ephrus services. This cde is the hand-in cde. Each custmer can have multiple cdes, each cde cntains a custmer prfile. This prfile can be used t change the settings f the Ephrus Detectin Service. A high prfile will prduce fewer results than a lw prfile. Fr details see the Ephrus manuals at http://www.ephrus.cm/manuals.html. 7
5. HandInService The HandInService is used t uplad dcuments fr plagiarism detectin. The dcuments upladed using this service will be autmatically prcessed. 5.1. Ntes Uplad ne dcument per request; The detectin prfile depends n the hand-in cde; Dn t use batch uplads f dcuments, send them directly when upladed at the learning system; Each upladed dcument must be unique. T prtect database integrity duplicate dcuments will be rejected; Maximum uplad size 16MB. 5.2. UpladDcument When uplading a dcument the learning system sends a SOAP request t the Ephrus Uplad Dcument web service. The respnse cntains an Ephrus dcument unique identifier. 5.2.1. Request The learning system sends a SOAP request t the HandInService. This request cntains the fllwing data. Fields indicated with an asterisk must cntain data (nt null). Between brackets we state the maximum length f the fields. Cde (*) The unique hand-in cde t hand in the String dcument t. (Required) Student number (*) Student number which turned in the String (25) dcument. (Required) First name f student (*) Student first name which turned in the String (25) dcument. (Required) Middle name f student Student first name which turned in the String (10) dcument. Surname f student (*) Student surname name which turned in the String (25) dcument. (Required) Email Student email address which turned in the String (75) dcument. Cmments -N usage- String (500) Filename (*) The name f the upladed file (Required) String Base64 cded file (*) The upladed file itself. (Required) Base64Binary PrcessType The type fr prcessing the dcument: 1. Visible D check (Default) Dcument will be checked fr plagiarism and will als be placed in index. 2. Visible D nt check Dcument wn t be checked fr plagiarism but will be placed in index. 3. Invisible D check Dcument will be checked fr plagiarism but wn t be placed in index. Integer 5.2.2. Respnse The SOAP respnse cntains the unique identifier f the upladed dcument. This identifier must be saved at the learning system itself, it can be used fr referencing the dcument later n. If any errr ccurs, a SOAP exceptin will be returned. We strngly recmmend yu t catch these errrs, since they help yu debugging. 5.2.3. Supprted file types The fllwing file types are supprted by Ephrus (between brackets the file extensin): Micrsft Wrd (dc) Plain text (txt) Richt text (rtf) Open Office (sxw & dt) Adbe Acrbat (pdf) HyperText Markup Language (html/htm). 8
6. Ephrus Reprting Service v1 When Ephrus has finished the detectin prcess, the Ephrus Reprting Service cnnects t the Learning System web service and sends a SOAP request cntaining the reprt f the plagiarism detectin. The Reprting Service will send a reprt fr every dcument upladed. When the system is nt available, it will try again every five minutes. The reprt cntains the fllwing data: reprt dcument_guid student_number student_name dcument_subject dcument_date dcument_percentage status status_descriptin prfile summary results result (zer r mre) result_guid url mimetype type percent diff riginal_guid 9
dcument_guid The Ephrus dcument unique identifier, the String system can use this value fr relating it t the upladed dcument. student_number Student number which turned in the dcument. String student_name Student name (first middle last) which turned in String the dcument. dcument_subject Subject f the dcument String dcument_date The submit date f the dcument (dd-mm-yyyy String hh:mm) dcument_percentage The ttal plagiarism percentage Integer status The status f the dcument: Integer 1: Ok 2: Duplicate 3: Errr status_descriptin Extended descriptin f the status. String prfile Strict / standard / cmpliant, the prfile used fr String checking summary Summary reprt in XML which can be transfrmed XmlNde using a XSL template. result_guid Result unique identifier String url The URL where the hit is fund String mimetype Mime type f riginal dcument String type Lcal / internet (where was the hit fund, in the String lcal r in the internet database) percent Match percentage Integer diff XML dcument which can be transfrmed using a XmlNde XSL template. This is a cmparisn f the riginal and the upladed dcument. riginal_guid The unique identifier f the riginal dcument when detected at the lcal search. String The web service will return an OK if the data is successfully retrieved r a SOAP exceptin in case f an errr. 6.1. Authenticatin The reprts are delivered at a pre knwn lcatin fr each rganizatin. The rganizatin must have a web service, which is public available. If necessary this can be prtected by a firewall, with nly the Ephrus ip addresses allwed. 6.2. Encryptin If needed the cnnectin can be secured using the https prtcl. When the https prtcl is used, all cmmunicatin will be encrypted. 10
7. Ephrus Reprting Service v2 When Ephrus has finished the detectin prcess, the Ephrus Reprting Service cnnects t the Learning System web service, and sends a SOAP request cntaining the reprt f the plagiarism detectin. The Reprting Service will send a reprt fr every dcument upladed. When the system is nt available, it will try again every five minutes. If the upladed dcument is a duplicate dcument the results f the riginal dcument will be send. The reprt cntains the fllwing data: reprt dcument_guid student_number student_name dcument_subject dcument_date dcument_percentage status status_descriptin prfile summary results result (zer r mre) result_guid url mimetype type percent diff riginal_guid student_number student_name 11
dcument_guid The Ephrus dcument unique identifier, the String system can use this value fr relating it t the upladed dcument. student_number Student number which turned in the dcument. String student_name Student name (first middle last) which turned in String the dcument. dcument_subject Subject f the dcument String dcument_date The submit date f the dcument (dd-mm-yyyy String hh:mm) dcument_percentage The ttal plagiarism percentage Integer status The status f the dcument: Integer 1: Ok 2: Duplicate 3: Errr status_descriptin Extended descriptin f the status. String prfile Strict / standard / cmpliant, the prfile used fr String checking summary Summary reprt in XML which can be transfrmed XmlNde using a XSL template. result_guid Result unique identifier String url The URL where the hit is fund String mimetype Mime type f riginal dcument String type Lcal / internet (where was the hit fund, in the String lcal r in the internet database) percent Match percentage Integer diff XML dcument which can be transfrmed using a XmlNde XSL template. This is a cmparisn f the riginal and the upladed dcument. riginal_guid The unique identifier f the riginal dcument when String detected at the lcal search. student_number Student number f the fund result. String student_name Student name (first middle last) f the fund result. String The web service will return an OK if the data is successfully retrieved r a SOAP exceptin in case f an errr. 7.1. Authenticatin The reprts are delivered at a pre knwn lcatin fr each rganizatin. The rganizatin must have a web service, which is public available. If necessary this can be prtected by a firewall, with nly the Ephrus IP addresses allwed. 7.2. Encryptin If needed the cnnectin can be secured using the https prtcl. When the https prtcl is used, all cmmunicatin will be encrypted. 12
8. Ephrus Reprting Service v3 When Ephrus has finished the detectin prcess, the Ephrus Reprting Service cnnects t the Learning System web service, and sends a SOAP request cntaining the reprt f the plagiarism detectin. The Reprting Service will send a reprt fr every dcument upladed. When the system is nt available, it will try again every five minutes. If the upladed dcument is a duplicate dcument the results f the riginal dcument will be send and als the unique identifier, student name and number f the riginal dcument. This versin will als send all the prfile results (strict, standard and cmpliant). The reprt cntains the fllwing data: reprt dcument_guid student_number student_name dcument_subject dcument_date dcument_percentage duplicate_riginal_guid duplicate_student_name duplicate_student_number status status_descriptin prfiles prfile (zer - three) prfile_type summary results result (zer - mre) result_guid url mimetype type percent diff riginal_guid student_number student_name 13
dcument_guid The Ephrus dcument unique identifier, the String system can use this value fr relating it t the upladed dcument. student_number Student number which turned in the dcument. String student_name Student name (first middle last) which turned in String the dcument. dcument_subject Subject f the dcument String dcument_date The submit date f the dcument (dd-mm-yyyy String hh:mm) dcument_percentage The ttal plagiarism percentage Integer duplicate_riginal_guid When the status f the dcument is duplicate it String will cntain the unique identifier f the riginal dcument. duplicate_student_name When the status f the dcument is duplicate it String will cntain the student name f the riginal dcument. duplicate_student_number When the status f the dcument is duplicate it String will cntain the student number f the riginal dcument. status The status f the dcument: Integer 1: Ok 2: Duplicate 3: Errr status_descriptin Extended descriptin f the status. String prfile_tye Strict / standard / cmpliant, the prfile used fr String checking summary Summary reprt in XML which can be XmlNde transfrmed using a XSL template. result_guid Result unique identifier String url The URL where the hit is fund String mimetype Mime type f riginal dcument String type Lcal / internet (where was the hit fund, in the String lcal r in the internet database) percent Match percentage Integer diff XML dcument which can be transfrmed using a XmlNde XSL template. This is a cmparisn f the riginal and the upladed dcument. riginal_guid The unique identifier f the riginal dcument String when detected at the lcal search. student_number Student number f the fund result. String student_name Student name (first middle last) f the fund result. String The web service will return an OK if the data is successfully retrieved r a SOAP exceptin in case f an errr. 8.1. Authenticatin The reprts are delivered at a pre knwn lcatin fr each rganizatin. The rganizatin must have a web service, which is public available. If necessary this can be prtected by a firewall, with nly the Ephrus IP addresses allwed. 8.2. Encryptin If needed the cnnectin can be secured using the https prtcl. When the https prtcl is used, all cmmunicatin will be encrypted. 14
9. IndexDcumentService The IndexDcumentService is used t shw/hide an existing dcument in the Ephrus index. When a dcument is hidden in the index it can t be fund as a result anymre. 9.1. Ntes Shw/hide ne dcument per request; The dcument must exist in the database and the status must be equal t 1 Ok ; 9.2. IndexDcument 9.2.1. Request The learning system sends a SOAP request t the IndexDcumentService. This request cntains the fllwing data. DcumentGuid IndexType The Ephrus dcument unique identifier. (Required) The type fr indexing the dcument: 1. Shw Dcument will be visible in the index. 2. Hide Dcument will be hidden in the index. String Integer 9.2.2. Respnse The SOAP respnse cntains the status 1 OK if the request is handled prperly. If any errr ccurs, a SOAP exceptin will be returned. 15
10. GetOriginalDcumentService The GetOriginalDcumentService is used t retrieve an riginal dcument frm Ephrus. Fr security reasns the integratr shuld request access t this web service. Ephrus will prvide the integratr with 3 certificates: 1. Ephrus CA; 2. Client certificate in PKCS #12 frmat; 3. Client certificate in DER frmat. The first certificate shuld be installed in the Trusted Rt Certificate Authrities stre. The secnd certificate shuld be installed in the Persnal stre. The last certificate shuld be passed with the web service n every request fr a dcument. 10.1. Ntes Retrieve ne dcument per request; The dcument must exist in the database f the rganizatin; The hand-in cde shuld be f ne f the members f the rganizatin. The client certificate must be valid. Only registered users can use this webservice! 10.2. GetOriginalDcument 10.2.1. Request The learning system sends a sap request t the GetOriginalDcumentService. This request cntains the fllwing data. Cde DcumentGuid MimeType The unique hand-in cde that the rganizatin uses t hand in dcuments. (Required) The Ephrus dcument unique identifier. (Required) Output parameter which returns the mimetype f the dcument that s being send. String String String (Out) The request must als cntain the client certificate (DER frmat). 10.2.2. Respnse The sap respnse cntains a base64 encding which cntains the riginal dcument. If any errr ccurs, a sap exceptin will be returned. 16
11. Example dcument cmparisn The diff dcument is a cmparisn f the upladed dcument and a matching dcument in xml frmat. The system integratr can transfrm this dcument using XSL s that it can display the results in its wn frmatting. 17
12. Appendix 12.1. Web Service Descriptin Language files http://services.ephrus.cm/handinservice/handinservice.asmx?wsdl 12.2. Sample dcument cmparisn xml Request at Ephrus. 12.3. Sample dcument cmparisn xslt stylesheet Request at Ephrus. 12.4. Sample summary xslt stylesheet Request at Ephrus. 12.5. Sample Webservice HandInDcument Request at Ephrus. Examples available in PHP, ASP and C#.NET. 12.6. Sample Webservice EphrusResults Request at Ephrus. Examples available in PHP, ASP and C#.NET. 18