{"_id":"57404838fa00bb0e0068c533","user":"5668fa9755e4b32100935d41","__v":12,"project":"5668fab608f90021008e882f","version":{"_id":"5668fab608f90021008e8832","__v":19,"project":"5668fab608f90021008e882f","createdAt":"2015-12-10T04:08:22.769Z","releaseDate":"2015-12-10T04:08:22.769Z","categories":["5668fab708f90021008e8833","569740f124490c3700170a64","569742b58560a60d00e2c25d","569742bd0b09a41900b2446c","569742cd69393517000c82b3","569742f459a6692d003fad8f","569743020b09a41900b2446d","5697430b69393517000c82b5","56a17776470ae00d00c30642","56a2c48a831e2a0d0069b1ad","56b535757bccae0d00e9a1cd","56e1ff6aa49fdc0e005746b5","57e1c88115bf6522002a5e4e","57fa65275ba65a17008b988f","57fbeea34002550e004c032e","58474584889b6c2d00fb86e9","58475dcc64157f0f002f1907","587e7b5158666c2700965d4e","58a349fc30852819007ba083"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.18.0","version":"1.18"},"githubsync":"","category":{"_id":"56a17776470ae00d00c30642","version":"5668fab608f90021008e8832","__v":5,"project":"5668fab608f90021008e882f","pages":["56a1827ad847b50d00a27729","56a1831f44f3d80d00a2c3c9","56a18a6244f3d80d00a2c3d3","56a18a83932d7c0d008bf231","56a18d1bd847b50d00a27735"],"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-01-22T00:27:34.102Z","from_sync":false,"order":5,"slug":"inventory","title":"Inventory (JavaScript)"},"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-21T11:36:24.949Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"Helpful documentation\",\n  \"body\": \"[This documentation](https://liftigniter.readme.io/docs/pinit) will supplement some context here.\"\n}\n[/block]\nInstead of using an OG tag or `liftigniter-metadata` tag, you might want our script to scrape from an arbitrary DOM that you can pick with a selector, or possibly a JavaScript variable that's accessible through global context (window).\n\nFor example, let's say that there's a HTML tag on a page that looks like:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"<div id=\\\"item\\\"> Delicious Pasta </div>\\n<a href=\\\"www.cooking123.com\\\">Author's blog</a>\\n<a rel=\\\"author\\\" href=\\\"/italian/Chef-James/\\\">Chef James</a>\",\n      \"language\": \"html\"\n    }\n  ]\n}\n[/block]\nYou would like to scrape the elements or attribute by defining `features` object on the `config`. i.e.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"var customConfig = {\\n\\tconfig: {\\n    inventory : {\\n    \\tfeatures: [\\n      \\t{\\n          name: \\\"title\\\",\\n          selector: \\\"div[id=item]\\\",\\n          type: \\\"text\\\",\\n          transform: function(value) {\\n        \\t  return value.trim();\\n          }\\n    \\t  },\\n        {\\n       \\t  name: \\\"authorBlog\\\",\\n          selector: \\\"a[rel=author]\\\",\\n          type: \\\"attribute\\\",\\n          attribute: \\\"href\\\"\\n        }\\n      ]\\n    }\\n  }\\n}\\n\\n$p(\\\"init\\\",\\\"{JS_KEY}\\\", customConfig);\",\n      \"language\": \"javascript\"\n    }\n  ]\n}\n[/block]\nYou can specify the DOM that contains the data you want with its  `CSS selector` defined by `selector` parameter. `name` is the field name of data you will pass in to your item metadata. `type` specifies the type of data you want to scrape - if it's set as `text`, then it will grab the textContent of the specified DOM, if it's `attribute`, then it will scrape the value of specified attribute (for example, `src` of an image).\n\n`transform` is a function that takes the value of scraped value and allow you to transform it in an output that you want before shipping the item to our data store. \n\nLastly, if you wish to scrape a variable on window, then you can define a `feature` in the following way.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"// Take the value of window.metadata.pageUrl\\nvar customConfig = {\\n\\tconfig: {\\n    inventory: {\\n  \\t  features: [\\n    \\t  {\\n          name: \\\"url\\\",\\n          type: \\\"var\\\",\\n          variable: \\\"metadata.pageUrl\\\" // scrapes window.metadata.pageUrl\\n    \\t  }\\n      ]\\n    }\\n  }\\n}\\n\\n$p(\\\"init\\\",\\\"{JS_KEY}\\\", customConfig);\",\n      \"language\": \"javascript\"\n    }\n  ]\n}\n[/block]\nThe feature object can be summarized as:\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Name\",\n    \"h-1\": \"Type\",\n    \"h-2\": \"Description\",\n    \"0-0\": \"name\",\n    \"0-1\": \"String\",\n    \"1-0\": \"type\",\n    \"2-0\": \"selector\",\n    \"3-0\": \"attribute\",\n    \"4-0\": \"transform\",\n    \"0-2\": \"Name of the feature to be scraped.\",\n    \"1-2\": \"Type of feature to be scraped: we support `text`, `attribute`, `url`, and `var`.\",\n    \"1-1\": \"String\",\n    \"2-1\": \"String\",\n    \"3-1\": \"String\",\n    \"4-1\": \"Function (String) -> Any\",\n    \"2-2\": \"CSS Selector of the DOM that you want our script to scrape. Specifying the type will\",\n    \"3-2\": \"Attribute of the DOM element that you would like to parse.\",\n    \"4-2\": \"Transformation function applied to output of scraped value.\"\n  },\n  \"cols\": 3,\n  \"rows\": 5\n}\n[/block]","excerpt":"","slug":"configuring-what-to-scrape","type":"basic","title":"Scraping Configuration"}

Scraping Configuration


[block:callout] { "type": "info", "title": "Helpful documentation", "body": "[This documentation](https://liftigniter.readme.io/docs/pinit) will supplement some context here." } [/block] Instead of using an OG tag or `liftigniter-metadata` tag, you might want our script to scrape from an arbitrary DOM that you can pick with a selector, or possibly a JavaScript variable that's accessible through global context (window). For example, let's say that there's a HTML tag on a page that looks like: [block:code] { "codes": [ { "code": "<div id=\"item\"> Delicious Pasta </div>\n<a href=\"www.cooking123.com\">Author's blog</a>\n<a rel=\"author\" href=\"/italian/Chef-James/\">Chef James</a>", "language": "html" } ] } [/block] You would like to scrape the elements or attribute by defining `features` object on the `config`. i.e. [block:code] { "codes": [ { "code": "var customConfig = {\n\tconfig: {\n inventory : {\n \tfeatures: [\n \t{\n name: \"title\",\n selector: \"div[id=item]\",\n type: \"text\",\n transform: function(value) {\n \t return value.trim();\n }\n \t },\n {\n \t name: \"authorBlog\",\n selector: \"a[rel=author]\",\n type: \"attribute\",\n attribute: \"href\"\n }\n ]\n }\n }\n}\n\n$p(\"init\",\"{JS_KEY}\", customConfig);", "language": "javascript" } ] } [/block] You can specify the DOM that contains the data you want with its `CSS selector` defined by `selector` parameter. `name` is the field name of data you will pass in to your item metadata. `type` specifies the type of data you want to scrape - if it's set as `text`, then it will grab the textContent of the specified DOM, if it's `attribute`, then it will scrape the value of specified attribute (for example, `src` of an image). `transform` is a function that takes the value of scraped value and allow you to transform it in an output that you want before shipping the item to our data store. Lastly, if you wish to scrape a variable on window, then you can define a `feature` in the following way. [block:code] { "codes": [ { "code": "// Take the value of window.metadata.pageUrl\nvar customConfig = {\n\tconfig: {\n inventory: {\n \t features: [\n \t {\n name: \"url\",\n type: \"var\",\n variable: \"metadata.pageUrl\" // scrapes window.metadata.pageUrl\n \t }\n ]\n }\n }\n}\n\n$p(\"init\",\"{JS_KEY}\", customConfig);", "language": "javascript" } ] } [/block] The feature object can be summarized as: [block:parameters] { "data": { "h-0": "Name", "h-1": "Type", "h-2": "Description", "0-0": "name", "0-1": "String", "1-0": "type", "2-0": "selector", "3-0": "attribute", "4-0": "transform", "0-2": "Name of the feature to be scraped.", "1-2": "Type of feature to be scraped: we support `text`, `attribute`, `url`, and `var`.", "1-1": "String", "2-1": "String", "3-1": "String", "4-1": "Function (String) -> Any", "2-2": "CSS Selector of the DOM that you want our script to scrape. Specifying the type will", "3-2": "Attribute of the DOM element that you would like to parse.", "4-2": "Transformation function applied to output of scraped value." }, "cols": 3, "rows": 5 } [/block]